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Abstract 


The  success  of  Moore’s  Law  has  conditioned  the  semiconductor  industry  to  expect 
continuing  improvements  in  high  performance  chips.  Limits  to  the  power  reduction 
that  can  be  realized  with  traditional  digital  design  provide  motivation  for  study¬ 
ing  probabilistic  computing  and  inexact  methods  that  offer  potential  energy  savings, 
performance  improvements,  and  area  improvement.  This  dissertation  addresses  how 
much  energy  and  power  can  be  saved  if  one  accepts  additional  tradeoffs  in  accuracy 
(and  thus  advantages  in  power  consumption,  and  decreased  heat  removal). 

This  work  advances  the  state  of  the  art  of  inexact  computing  by  optimizing  the 
JPEG  File  Interchange  Format  (JFIF)  compression  algorithm  for  reduced  energy, 
delay,  and  area.  The  dissertation  presents  a  demonstration  of  inexact  computing 
implemented  in  the  JPEG  algorithm  applied  to  an  analysis  of  uncompressed  TIFF 
images  of  a  U.S.  Air  Force  F-16  aircraft  provided  by  the  University  of  Southern 
Galifornia  Signal  and  Image  Processing  Institute  (SIPI)  image  database.  The  JPEG 
algorithm  is  selected  as  a  motivational  example  because  it  is  widely  available  to  the 
U.S.  Air  Force  community  and  is  widely  used  in  many  areas  including  the  military, 
education,  business,  and  users  of  personal  electronics.  The  JPEG  algorithm  is  also 
selected  because  it  is  by  its  nature  a  lossy  compression  algorithm,  where  the  existence 
of  loss  indicates  the  users  willingness  to  accept  error. 

The  approach  of  this  research  is  to  predict  the  performance  of  GMOS  components 
as  implemented  in  solving  problems  of  probabilistic  Boolean  logic,  with  a  specihc  focus 
on  the  most  advanced  silicon  GMOS  technology  currently  in  high  volume  manufac¬ 
turing  today  (the  14  nm  FinFET  silicon  GMOS  technology).  The  main  contribution 
of  this  research  is  a  method  to  quantify  the  energy  savings  resulting  from  the  de¬ 
cision  to  accept  a  specihed  percentage  of  error  in  some  components  of  a  computing 
system.  The  dissertation  presents  a  demonstration  of  the  JPEG  algorithm  in  which 
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two  components  of  the  algorithm  (namely  the  first  and  third  steps,  color  space  trans¬ 
formation  and  discrete  cosine  transform)  take  advantage  of  the  reduced  energy  and 
power  that  can  be  achieved  when  one  accepts  a  certain  amount  of  inexactness  in  the 
result.  Detailed  studies  in  energy- accuracy  tradeoffs  in  adders  and  multipliers  are 
presented. 

We  have  shown  that  we  could  cut  energy  demand  in  half  with  16-bit  Kogge-Stone 
adders  that  deviated  from  the  correct  value  by  an  average  of  3.0  percent  in  14  nm 
CMOS  FinFET  technology,  assuming  a  noise  amplitude  of  3  x  10-12  v2/Hz.  This 
was  achieved  by  reducing  to  0.6  V  instead  of  its  maximum  value  of  0.8  V.  The 
energy-delay  product  (EDP)  was  reduced  by  38  percent. 

Adders  that  got  wrong  answers  with  a  larger  deviation  of  about  7.5  percent  (using 
Vdd  =  0.5  V)  were  up  to  3.7  times  more  energy-efficient,  and  the  EDP  was  reduced 
by  45  percent. 

Adders  that  got  wrong  answers  with  a  larger  deviation  of  about  19  percent  (using 
yoD  =  0.3  V)  were  up  to  13  times  more  energy-efficient,  and  the  EDP  was  reduced 
by  35  percent. 

We  used  inexact  adders  and  inexact  multipliers  to  perform  the  color  space  trans¬ 
form,  and  found  that  with  a  1  percent  probability  of  error  at  each  logic  gate,  the 
letters  “F-16”,  which  are  14  pixels  tall,  and  “U.S.  AIR  FORCE”,  which  are  8  to  10 
pixels  tall,  are  readable  in  the  processed  image,  where  the  relative  RMS  error  is  5.4 
percent. 

We  used  inexact  adders  and  inexact  multipliers  to  perform  the  discrete  cosine 
transform,  and  found  that  with  a  1  percent  probability  of  error  at  each  logic  gate, 
the  letters  “F-16”,  which  are  14  pixels  tall,  and  “U.S.  AIR  FORCE”,  which  are  8  to 
10  pixels  tall,  are  readable  in  the  processed  image,  where  the  relative  RMS  error  is 
20  percent. 
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Results  presented  in  this  dissertation  show  that  91.8%  of  the  color  space  trans¬ 
form  can  be  built  from  inexact  components  and  that  92.0%  of  the  discrete  cosine 
transformation  can  be  built  from  inexact  components.  The  results  show  that,  for  the 
case  in  which  the  probability  of  correctness  is  99%,  the  color  space  transformation  has 
55%  energy  reduction  per  pixel  of  an  uncompressed  image,  and  the  discrete  cosine 
transformation  step  also  has  a  55%  energy  reduction  per  pixel. 
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Ninexact  number  of  bits  which  are  computed  inexactly 
bit  width  of  floating-point  exponent 
N-tn  bit  width  of  floating-point  mantissa 

~  is  normally  distributed  with  mean  p  and  variance  cr^ 

P  product  of  a  multiplier 

P  approximate  product 

Pk  kih.  partial  product  of  a  multiplier 

P(-)  probability 

V  power  dissipated 

Vd  dynamic  power 

Vs  static  power 

p  probability  of  correctness 

pk^i  ith  bit  of  Pk 

Q  quantized  form  of  C 

Q  zigzag  arrangement  of  Q 

Qij  (i,  j)th  element  of  Q 

R  compression  ratio 

V  red  component 

S  iV-bit  sum  of  an  adder,  excluding  the  carry-out  bit 

{N  -|-  l)-bit  augmented  sum  of  an  adder,  including  the  carry-out  bit 
T  clock  period 

t  time 
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U  unitary  discrete  cosine  transformation  matrix 

~  Unif(a,6)  is  uniformly  distributed  between  a  and  b 
Vdd  power  supply  voltage 

w  vector  of  all  noise  sources  within  an  inexact  circuit 

Win  noise  at  the  input  node  of  a  circuit 

Wout  noise  at  the  output  node  of  a  circuit 

X  8x8  tile  containing  image  (F,  Cb,  or  Cr)  data 

X  input  vector  to  a  digital  logic  circuit 

Y  output  of  a  digital  logic  circuit 

y  luminance 

Z  quantization  factor  matrix 

Zij  element  of  Z 

Subscripts 

d  dynamic  (as  in  dynamic  power) 

i  ith  bit  position 

in  input 

k  kth  stage  of  a  circuit 

I  /th  state  in  a  sequence 

max  maximum 

min  minimum 

out  output 

rms  root-mean-square  average 

s  static  (as  in  static  power) 

Superscripts 
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augmented  (as  in  the  augmented  sum  of  an  adder) 


Accents 

partial  (as  in  partial  product) 
zigzag  sequence 
normalized 
average  (mean) 

pertaining  to  an  approximate  or  inexact  computational  circuit 
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DEMONSTRATION  OF  INEXACT  COMPUTING 
IMPLEMENTED  IN  THE  JPEG  COMPRESSION  ALGORITHM 
USING  PROBABILISTIC  BOOLEAN  LOGIC 
APPLIED  TO  CMOS  COMPONENTS 

I.  Introduction 

The  success  of  Moore’s  law  [1,  2,  3]  has  conditioned  the  semiconductor  industry 
to  expect  continuing  improvements  in  high  performance  chips.  The  success  has  pro¬ 
duced  a  situation  in  which  circuit  designers  are  designing  products  around  Moore’s 
Law.  Concerns  about  the  possible  end  of  Moore’s  Law  are  being  raised,  and  re¬ 
searchers  are  seeking  ways  to  sustain  this  trend.  This  trend  encroaches  on  areas  such 
as  probabilistic  computing  [4,  5,  6,  7,  8,  9]  or  neuromorphic  computing  [10,  11,  12,  13]. 
The  general  motivation  for  studying  probabilistic  computing  and  inexact  methods  lies 
in  three  areas  of  potential  energy  savings,  performance  improvement,  and  area  im¬ 
provement  (that  is,  reduced  density). 

While  Moore’s  Law  has  produced  an  expectation  that  more  transistors  can  be 
packed  on  a  chip,  there  is  a  physical  limit  to  classical  scaling  theory  described  by 
Dennard  [14].  At  the  same  time  it  is  difficult  to  reliably  manufacture  billions  of 
transistors  that  operate  without  failure  on  a  single  chip. 

With  an  insatiable  demand  for  computation,  there  are  limits  to  the  power  re¬ 
duction  that  can  be  realized  with  traditional  digital  design.  There  is  also  a  need  to 
build  a  fault  tolerant  chip  that  can  “handle  failure  gracefully”  [15].  Because  of  these 
limits,  researchers  are  investigating  methods  such  as  probabilistic  computing  as  ways 
to  extract  functionality  that  is  “good  enough”  while  operating  with  less  power. 
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There  are  two  broad  approaches  to  studying  inexact  methods.  First,  there  are 
methods  that  provide  deterministic  inexactness  (that  is,  a  truth  table  that  is  occa¬ 
sionally  wrong).  It  is  possible  to  apply  these  methods  in  situations  when  a  system — in 
which  the  logic  is  contained — does  not  care  about  a  wrong  answer  [16].  That  is,  de¬ 
terministic  inexactness  can  be  useful  when  the  system  doesn’t  care  about  the  wrong 
result. 

Second,  there  is  the  method  of  nondeterministic  inexactness.  Nondeterministic 
inexactness  refers  to  a  circuit  design  that  has  a  certain  probability  of  error  due  to 
unknown  variables  including  noise,  interference,  manufacturing  defects,  or  radiation. 
Note  that  not  all  error  is  created  equal.  There  is  error  due  to  approximation,  and 
there  is  random  error.  There  are  profoundly  different  consequences  of  these  types  of 
error.  For  example,  some  error  may  be  tolerable  when  processing  image  data  or  audio 
data. 

Noise  is  one  of  the  sources  of  inexactness,  such  as  when  noise  corrupts  the  truth 
table.  Circuit  designers  intend  to  design  chips  to  be  very  robust.  However,  when 
circuit  designers  start  to  push  the  boundary,  and  they  start  by  degrading  the  design 
itself,  they  are  taking  these  steps  because  they  may  be  able  to  eliminate  some  tran¬ 
sistors  or  (perhaps)  use  smaller  transistors.  In  such  situations,  circuit  designers  start 
bending  some  of  the  rules  of  good  design  practice,  because  they  are  trying  to  achieve 
a  result  that  is  “good  enough” . 

Both  deterministic  and  nondeterministic  inexactness  must  be  well  understood 
by  the  circuit  designer.  Specihcally,  the  designer  needs  to  understand  when  and  if 
inexactness  can  be  applied.  Consider  the  situation  in  which  a  designer  has  prepared 
a  block  diagram  of  a  system.  There  are  circumstances  in  which  designers  can  inspect 
a  block  diagram  and  decide  which  blocks  of  the  design  are  those  in  which  they  can 
use  inexactness  and  which  blocks  are  those  in  which  they  cannot  use  inexactness. 
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For  example,  it  is  undesirable  for  a  state  machine  to  jump  to  a  random  next  state. 
Therefore  circuit  designers  would  assume  that  that  part  of  the  design  (that  is,  the 
state  machine)  should  be  handled  by  traditional  design  methods — that  is,  by  exact 
methods.  But  the  designers  may  identify  other  parts  of  the  system  design  in  which 
one  may  realistically  use  inexactness.  Here,  the  fundamental  point  of  this  argument 
is  that  circuit  designers  cannot  use  inexactness  indiscriminately,  and  inexactness  only 
works  in  certain  types  of  circumstances. 

The  main  contribution  of  this  dissertation  is  a  method  to  quantify  the  energy 
savings  resulting  from  the  decision  to  accept  a  specihed  percentage  of  error  in  some 
components  of  a  computing  system.  Recall  von  Neumann’s  pursuit  to  build  reliable 
systems  from  unreliable  components  [17,  18,  19,  20].  Von  Neumann  struggled  with  the 
idea  of  how  to  obtain  correct  results  when  the  constituent  components  are  unreliable. 
This  work  was  performed  following  the  invention  of  the  transistor,  and  vacuum  tubes 
were  used  but  failed  (with  vacuum  tubes,  the  mean  time  to  failure  was  10  to  20 
minutes,  and  computing  systems  contained  many  vacuum  tubes).  Designers  started 
to  worry  less  when  transistors  provided  higher  yield,  but  out  of  the  work  by  von 
Neumann  emerged  ideas  such  as  triple  module  redundancy. 

In  computing  applications,  data  compression  is  necessary  due  to  memory,  disk 
space,  and  network  bandwidth  constraints.  The  Joint  Photographic  Experts  Group 
(JPEG)  File  Interchange  Format  (JFIF)  is  ubiquitous,  and  is  a  very  effective  method 
of  image  compression.  Throughout  this  document,  we  refer  to  the  JFIF  compression 
algorithm,  which  is  used  for  this  research,  as  the  “JPEG  compression  algorithm” 
(not  to  be  confused  with  JPEG2000).  For  many  imaging  applications,  energy  is  at 
a  premium  due  to  battery  life  and  heat  dissipation  concerns.  For  space  applications, 
the  electronic  systems  are  also  susceptible  to  the  effects  of  the  natural  radiation 
environment;  for  example,  solor  protons  and  galactic  cosmic  rays  can  cause  single 
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event  upsets  within  the  circuits,  which  is  another  form  of  hardware  unreliability. 
The  emerging  held  of  inexact  computing  promises  to  deliver  energy-parsimonious 
computation  by  sacrihcing  the  strict  requirement  for  accuracy  in  digital  logic  circuits. 
The  contribution  of  this  research  will  be  to  advance  the  state  of  the  art  of  inexact 
computing  by  optimizing  the  JPEG  compression  algorithm  for  reduced  energy,  delay, 
and  area. 

Inexact  computing  is  ultimately  based  on  information  theory.  Consider  an  adder 
which  computes  a  sum  from  inputs  A  and  B.  Now  consider  an  approximate 
adder  with  output  S~^,  which  is  an  estimator  for  S^.  How  much  information  does  S~^ 
contain  about  about  the  desired  sum  With  inexact  computing,  it  is  possible  to 
save  energy,  delay,  and  area,  and  still  obtain  information  about  S~^,  without  obtaining 
the  exact  value  of  S~^. 

1.1  Strategy  for  Applying  Inexact  Methods 

How  do  we  know  whether  an  inexact  design  is  appropriate  for  a  particular  ap¬ 
plication?  The  decision-making  process  is  outlined  in  the  howchart  in  Fig.  1.  The 
howchart  proceeds  as  follows.  First,  the  designer  must  consider  the  algorithm  which 
the  system  will  execute.  If  the  algorithm  is  well-understood  an  has  been  previously 
implemented  in  hardware  using  exact  methods,  then  historical  data  will  provide  a 
baseline  for  the  performance,  area,  and  energy  consumption  required  to  implement 
the  algorithm.  If  the  algorithm  is  new,  and  has  never  previously  been  built  into  hard¬ 
ware,  then  the  designer  should  create  an  exact  design  (i.e.  schematics  or  layouts)  to 
determine  the  baseline  for  the  algorithm. 

From  the  baseline,  the  designer  can  estimate  the  whole  algorithm  will  occupy  an 
estimated  fraction  /  of  the  chip.  The  fraction  /  could  be  a  fraction  of  the  total  chip 
area,  gate  count,  or  other  such  metric.  In  the  “prohling”  step,  the  designer  divides 
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Determine  Exact 

(Re-)  Profiiing  of 

Baseline 

Algorithm 

Figure  1.  Decision-making  flowchart  for  inexact  design. 
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the  algorithm  into  sub-algorithms.  If  there  are  n  sub-algorithms,  then  the  fcth  sub¬ 
algorithm  occupies  an  estimated  fraction  fk  of  the  chip.  The  total  of  all  fractions  fk 
add  up  to  /,  that  is, 

n 

(1) 

k=l 

Each  sub-algorithm  is  designed  separately.  Types  of  sub-algorithms  include:  com¬ 
putation,  state  machines,  encoding,  routing,  memory  storage  and  retrieval,  time  syn¬ 
chronization,  and  random  number  generation.  Some  sub-algorithms  are  potentially 
amenable  to  inexactness,  while  others  are  not.  Of  these  categories,  we  would  ar¬ 
gue  that  computation,  memory  storage  and  retrieval,  and  random  number  generation 
could  possibly  be  done  inexactly  in  a  useful  way,  while  the  others  would  not.  However, 
future  research  may  hnd  useful  ways  to  perform  the  other  sub-algorithms  inexactly 
also. 

Some  sub-algorithms  have  multiple  possible  formulations.  If  an  inexact  design 
is  being  proposed,  then  the  designer  should  choose  the  formulation  that  is  most 
amenable  to  inexact  methods;  for  example,  a  formulation  based  on  addition  is  likely 
more  tolerant  of  inexactness  than  a  formulation  based  on  a  decision  tree.  If,  after 
considering  all  possible  formulations  of  the  sub-algorithm,  there  appears  to  be  no 
possible  benefit  to  an  inexact  design,  then  the  designer  will  choose  to  design  the 
sub-algorithm  using  exact  methods  only. 

If  the  algorithm  is  a  potentially  inexact  algorithm,  then  it  can  be  implemented 
in  an  inexact  hardware  technology.  As  device  dimensions  shrink,  it  is  increasingly 
difficult  for  devices  to  perform  consistently,  reliably,  and  to  have  good  noise  margins. 
Future  hardware  technologies  may  be  designed  to  perform  in  a  probabilistic  manner, 
that  is,  it  will  function  “correctly”  most  of  the  time,  but  at  each  individual  node  there 
is  a  probability  p  of  correctness,  where  0.5  <  p  <  1,  and  a  probability  (1  —  p)  of  error. 
Errors  could  be  caused  by  noise,  radio  frequency  (RE)  interference,  crosstalk  between 
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components,  threshold  voltage  variations,  cross-chip  variations,  or  cosmic  radiation. 
In  inexact  hardware  technology,  the  interconnect  between  transistors  would  also  be 
susceptible  to  crosstalk,  which  is  a  possible  error  source.  Such  technology  would 
include  relaxed  design  rules  with  tabulated  probabilities  of  correctness  which  depend 
on  the  area  and  spacing  allocated  to  each  transistor  or  interconnect. 

Based  on  the  exact  algorithm  baseline  and  the  candidate  process  technologies, 
the  designer  should  determine  a  baseline  for  the  performance  of  the  sub-algorithm 
implemented  using  exact  methods.  This  is  used  as  a  basis  for  comparison  with  the 
inexact  design. 

The  next  step  is  to  choose  the  number  of  bits  of  precision  required  to  process  the 
data.  Information  collected  from  analog  sensors  is  inherently  uncertain,  and  needs 
only  a  limited  number  of  signihcant  hgures  to  express  it.  This  should  be  reflected  in 
the  computing  hardware.  For  example,  if  the  signihcant  hgures  of  the  data  can  be 
expressed  in  eight  bits,  then  using  a  64-bit  adder  to  process  it  would  be  a  waste  of 
hardware  and  energy. 

The  designer  must  also  determine  the  constraints  which  the  design  must  meet. 
Constraints  may  include  timing,  area,  current,  and  power  dissipation. 

Next,  the  designer  chooses  the  metrics  by  which  the  system  will  be  evaluated. 
These  may  include  the  energy  consumption,  delay,  and  area  of  the  hardware.  Other 
metrics  will  rehect  the  overall  quality  of  the  overall  system  output.  For  example,  in 
image  compression,  the  metrics  for  system  output  include  root-mean-squared  (RMS) 
error.  Signal  to  Noise  Ratio  (SNR),  and  compression  ratio.  For  human  consumption 
of  images,  however,  simple  metrics  do  not  adequately  summarize  the  quality  of  an 
image,  so  it  is  up  to  the  users  opinion  as  to  whether  the  image  quality  is  “good 
enough” . 

The  next  step  is  to  choose  the  error  tolerance,  or  error  bounds,  for  the  overall 
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system  output.  Error  metrics  could  include  statistics  such  as:  the  maximum  possible 
error,  RMS  error,  or  likelihood  of  nonzero  error.  The  system  design  will  not  be  allowed 
to  exceed  the  error  tolerance. 

Probabilistic  pruning  and  probabilistic  logic  minimization  techniques  are  methods 
of  simplifying  the  digital  logic  hardware  while  producing  an  approximate  computa¬ 
tional  output  in  a  deterministic  manner.  Probabilistic  pruning  and  probabilistic  logic 
minimization  simultaneously  meet  the  objectives  of  reducing  energy  consumption, 
delay,  and  chip  area,  by  sacrihcing  the  accuracy  of  the  output  in  some  cases.  These 
techniques  are  used  repeatedly  until  a  maximum  error  tolerance  is  met  [21]. 

After  using  deterministic  inexact  design  techniques,  the  designer  considers  the 
effects  of  non-deterministic  errors  due  to  the  inexact  nature  of  the  process  technology. 
The  designer  can  “tune  the  design  to  reduce  area  and  energy  consumption,  until  the 
effects  of  non-deterministic  errors  exceed  the  pre-determined  error  bound,  or  the 
design  constraints  (e.g.  delay)  are  no  longer  met. 

After  creating  the  inexact  design  for  the  sub-algorithm,  the  designer  compares  the 
inexact  design  with  the  baseline  previously  determined.  If  the  inexact  design  results 
in  a  substantial  savings  of  energy,  delay,  area,  energy-delay  product,  or  other  such 
metric,  without  an  excessive  amount  of  error,  then  the  designer  chooses  to  use  the 
inexact  design  for  the  sub-algorithm.  If,  however,  there  is  little  apparent  beneht  to 
the  inexact  design,  then  the  designer  chooses  to  use  an  exact  design  instead. 

Once  hardware  has  been  designed  for  all  sub-algorithms,  the  parts  must  be  eval¬ 
uated  together  as  a  whole  system.  At  this  point,  the  designer  makes  sure  the  entire 
algorithm  works  properly,  and  then  compares  it  to  the  exact  baseline  for  the  whole 
algorithm.  If  the  overall  performance  meets  user  requirements,  then  the  design  is 
submitted  for  fabrication.  If  not,  then  the  designer  must  re-pro£le  some  or  all  of  the 
algorithm,  and  then  re-design  some  or  all  of  the  sub-algorithms  until  performance  is 


satisfactory. 


1.2  Motivational  Link  to  Air  Force  Needs  and  Vision 

The  need  in  the  U.S.  Air  Force  for  the  capability  to  create  perfect  computing 
systems  out  of  imperfect  components  to  achieve  military  and  space  objectives  dates  at 
least  to  the  construction  of  the  earliest  computing  systems  including  those  built  from 
mechanical  components,  vacuum  tubes,  and  instrumentation  in  the  earliest  Apollo 
missions.  There  are  some  applications  of  electronic  systems  that  can  tolerate  inexact 
systems,  and  some  applications  that  cannot  tolerate  inexactness. 

For  example,  for  a  life  support  system,  the  system  should  be  as  reliable  as  possible, 
and  inexactness  is  undesirable.  For  a  satellite  system  that  can  operate  in  low  power 
mode,  inexact  computing  may  be  tolerated  to  achieve  trade-offs  in  size,  weight,  and 
power.  To  incorporate  inexact  systems  in  space  applications,  there  is  a  need  to 
quantify  trade-offs  in  size,  weight,  and  power. 

This  dissertation  is  concerned  with  exploring  the  incorporation  of  inexactness  in 
the  JPEG  algorithm,  shown  in  Fig.  2,  which  is  by  its  nature  a  lossy  compression 
algorithm,  where  the  existence  of  loss  indicates  the  user’s  willingness  to  accept  error. 
That  will  be  discussed  in  this  dissertation.  As  a  contribution  of  this  work,  we  find 
that  the  Color  Space  Transformation  (GST)  has  55%  energy  reduction  per  pixel  of 
an  uncompressed  image,  and  the  discrete  cosine  transformation  step  also  has  a  55% 
energy  reduction  per  pixel  for  the  case  in  which  the  probability  of  correctness  p  = 
0.99.  These  results  are  promising  and  indicate  that  JPEG  can  be  considered  a  fitting 
example  for  this  type  of  study.  Future  work  will  consider  the  additional  components, 
which  are:  tiling,  quantization,  zigzagging,  run-amplitude  encoding,  and  Huffman 
encoding.  The  JPEG  algorithm  can  be  implemented  in  hardware  or  software.  Tiling 
and  zigzagging  are  simply  routing  of  data,  and  in  a  hardware  implementation,  it  is 
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Figure  2.  Block  diagram  of  the  JPEG  image  compression  algorithm.  In  this  disserta¬ 
tion,  the  JPEG  algorithm  is  a  motivational  example  for  inexact  computing.  The  shaded 
boxes  show  areas  where  inexact  methods  can  be  considered  (for  example,  in  adder  cir¬ 
cuits  and  multiplier  circuits).  The  white  boxes  show  areas  where  inexact  methods 
cannot  be  considered  (“keep-out  zones”). 


possible  to  implement  these  two  steps  via  wiring  only.  In  a  software  implementation, 
we  would  not  want  errors  in  the  instruction  pipeline  or  memory  addressing,  so  in  that 
case  inexact  tiling  and  zigzagging  would  not  be  desirable.  In  the  JPEG  algorithm,  the 
user’s  willingness  to  accept  error  leads  to  trade-offs  in  size,  weight,  and  power  within 
the  color  space  transformation,  discrete  cosine  transformation,  and  quantization  steps, 
as  indicated  in  the  pink  areas  in  Fig.  2. 

We  now  consider  briefly  a  historical  example  of  why  the  Air  Force  should  care 
about  implementing  inexactness  in  computing  systems.  In  the  book  entitled  Digital 
Apollo,  the  author  David  Mindell,  Dibner  Professor  of  the  History  of  Engineering  and 
Manufacturing,  Professor  of  Engineering  Systems,  and  Director  of  the  Program  in 
Science,  Technology,  and  Society  at  MIT,  describes  how  the  Apollo  astronauts  became 
their  own  “ultimate  redundant  components”  by  carrying  on  board  with  them  extra 
copies  of  the  computer  hardware  so  that  in  case  of  failure,  they  could  replace  the  faulty 
hardware  with  one  of  the  backups.  Triple  module  redundancy  is  one  example  of  a  now- 
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common  approach  to  creating  more  perfect  computing  systems  out  of  components  that 
are  understood  to  be  imperfect. 

In  the  book,  he  writes, 


“Robert  Chilton  remembered  that  Hall  and  his  group  paid  constant 
attention  to  reliability  questions,  though  NASA  wasn’t  prepared  to  give 
them  a  specihcation. . . .  [more  specihc  than]  ‘as  reliable  as  a  parachute’ 

. . .  with  one  in  a  hundred  chance  the  mission  would  fail,  and  a  one  in  a 
thousand  chance  the  astronauts  would  not  survive.”  (page  128,  Mindell) 

“Hall  estimated  that  his  Block  I  integrated  circuit  computer  would 
have  a  reliability  of  0.966  . . . .,  but  the  spec  he  had  been  given  required 
reliability  nearly  ten  times  better.  To  make  up  the  difference  he  proposed 
[relying]  on  the  skills  of  the  astronauts:  they  would  repair  the  computer 
during  flight. . . .  Hall  proposed  that  Apollo  flights  also  carry  a  special 
machine,  a  ‘MicroMonitor,’  a  smaller  version  of  the  equipment  used  to 
check  out  the  computer  on  the  ground.  The  device  was  heavy  and  took 
up  space,  required  its  operator  to  ‘exercise  considerable  thought,’  and  re¬ 
quired  the  operator  to  have  a  mere  three  to  six  months  of  training. . .  ‘This 
device  is  known  to  be  effective.  Hall  wrote,  ‘in  direct  proportion  to  the 
training  and  native  skill  of  the  operator’.”  (page  129,  Mindell) 

“Astronauts  had  been  billed  as  the  ultimate  redundant  components. 
Asking  them  to  improve  the  reliability  of  their  equipment  seemed  sensible, 
but  it  proved  no  simple  task.”  (page  130,  Mindell)  [22] 

In  December  1965,  Eldon  C.  Hall,  of  the  MIT  Instrumentation  Laboratory  in  his 
report  E-1880  entitled,  “A  Case  History  of  the  AGC  Integrated  Logic  Circuits”  [23], 
Mr.  Hall  explains  in  this  report  that  the  use  of  “one  single,  simple  integrated  circuit 
for  all  logic  functions”  was  required  to  achieve  the  goals  of  “low  weight,  volume, 
and  power  coupled  with  extreme  high  reliability.”  As  he  describes,  the  “one  single, 
simple  integrated  circuit”  was  the  three  input  NOR  gate  (the  NOR3  logic  gate).  He 
writes  about  the  following  tradeoffs  in  the  selection  of  the  logic  element  in  the  Apollo 
Guidance  Computer, 


“The  logic  element  utilized  in  the  Apollo  Guidance  Computer  is  the  three 
input  NOR  Gate.  ..At  the  time  that  the  decision  was  made  to  use  in- 
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tegrated  circuits,  the  NOR  Gate  was  the  only  device  available  in  large 
quantities.  The  simplicity  of  the  circuit  allowed  several  manufacturers  to 
produce  interchangeable  devices  so  that  reasonable  competition  was  as¬ 
sured.  Because  of  recent  process  development  in  integrated  circuits,  the 
NOR  Gate  has  been  able  to  remain  competitive  on  the  basis  of  speed, 
power,  and  noise  immunity.  This  circuit  is  used  at  3V  and  15mW,  but  is 
rated  at  8  V  and  100  mW.  Unpowered  temperature  rating  is  150  degrees 
G.  The  basic  simplicity  of  the  three  input  gate  aids  an  effective  screen¬ 
ing  process.  All  transistors  and  resistors  can  be  tested  to  insure  product 
uniformity.  The  simplicity  of  the  circuit  also  aids  in  the  quick  detection 
and  diagnosing  of  insidious  failures  without  extensive  probing  as  required 
with  more  complicated  circuits.”  (page  4,  Hall  1965). 

It  is  recognized  from  at  least  the  time  in  this  report  (1965)  that  the  inherent 
reliability  gains  must  be  implemented  in  the  design  stages  of  the  computer.  This 
dissertation  recognizes  that  tradeoffs  between  energy,  power,  and  reliability  continue 
even  with  the  most  advanced  silicon  GMOS  technologies  available  today.  One  can 
even  state  that  the  tradeoffs  between  energy,  power,  and  reliability  are  most  especially 
of  concern  to  the  U.S.  Force  today,  because  the  needs  of  the  silicon  GMOS  technologies 
are  predominantly,  and  increasingly,  driven  by  the  consumer  marketplace  much  more 
so  than  the  environment  in  which  the  Apollo  missions  found  themselves  when  the  DoD 
formed  a  greater  portion  of  the  demand  for  integrated  circuits. 

Today,  however,  the  “strategic  environment  that  the  Air  Force  faces  over  the  next 
two  decades  is  substantially  different  from  that  which  has  dominated  throughout  most 
of  its  history,”  as  explained  in  the  document  entitled,  ‘A  Vision  for  Air  Force  Science 
Technology  During  2010-2030”  [24].  This  document  provides  an  overview  of  the  role 
of  110  Key  Technology  Areas  (KTAs)  including  the  KTA  of  advanced  computing 
architectures  that  support  the  12  Potential  Gapability  Areas  (PGAs)  of  the  U.S.  Air 
Force. 

The  document  “A  Vision  for  Air  Force  Science  Technology  During  2010-2030” 
also  describes  advantages  and  additional  capabilities  that  reductions  in  power,  per¬ 
formance,  and  area  can  provide  in  electronics  in  order  to  enable  the  U.S.  Air  Force  to 
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achieve  superiority  in  air,  space,  and  cyberspace  domains.  Specific  examples  are  per¬ 
sistent  near-space  communications  relays  (page  63),  where  power  and  thermal  man¬ 
agement  (heat  dissipation)  challenges  exist.  Reductions  in  size,  weight,  power,  and 
thermal  management  requirements  will  further  enable  greater  integration  of  complex 
systems  in  advanced  fighters,  space  satellites,  and  “other  tactical  platforms.”  (page 
72). 

This  dissertation  presents  a  demonstration  of  inexact  computing  implemented  in 
the  JPEG  compression  algorithm  using  probabilistic  Boolean  logic  applied  to  CMOS 
components,  with  a  specific  focus  on  the  most  advanced  silicon  CMOS  technology 
currently  in  high  volume  manufacturing  today  (namely,  the  14nm  FinFET  silicon 
CMOS  technology).  The  JPEG  algorithm  is  selected  as  a  motivational  example  since 
it  is  widely  accessible  to  the  U.S.  Air  Force  community.  It  is  also  well  known  that  the 
JPEG  algorithm  is  widely  known  and  widely  used  worldwide,  in  many  areas  including 
but  not  limited  to  the  military,  education,  business,  and  by  people  of  ages  who  capture 
images  on  their  personal  electronics,  cameras,  and  cell  phones.  This  dissertation  is 
interested  in  the  question  about  how  much  energy  and  power  can  be  saved  if  one 
might  be  able  to  accept  additional  tradeoffs  in  accuracy  (and  thus  lead  to  potential 
advantages  such  as  lower  power  consumption,  increased  battery  life,  and  decreased 
need  to  dissipate  heat  since  the  power  consumption  is  reduced). 

The  goal  of  this  dissertation  is  to  present  a  demonstration  of  this  JPEG  algorithm 
in  which  two  components  of  the  algorithm  (namely  the  first  step,  color  space  trans¬ 
formation,  and  the  third  step,  discrete  cosine  transformation)  take  advantage  of  the 
reduced  energy  and  power  that  can  be  achieved  when  one  accepts  a  certain  amount 
of  inexactness  in  the  result.  Energy-accuracy  tradeoffs  in  adders  and  multipliers  are 
explored  in  detail,  and  detailed  results  are  presented  quantifying  the  extent  to  which 
the  power-delay  product  can  be  reduced  as  a  function  of  probability  of  correctness. 
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The  dissertation  applies  the  inexact  JPEG  algorithm  to  an  analysis  of  uncompressed 
TIFF  images  of  an  F-16  U.S.  Air  Force  plane  provided  by  the  University  of  Southern 
California,  as  shown  in  Fig.  3.  In  this  dissertation,  we  only  analyze  the  data  from  the 
intensity  (Y)  component  of  the  image,  so  in  Chapter  V  the  hgures  appear  in  black- 
and-white;  however,  we  expect  very  similar  results  for  the  color  components  (Cb  and 
Cr)  since  they  are  processed  in  a  very  similar  way.  The  results  quantify  tradeoffs 
between  the  probability  of  correctness,  the  SNR,  and  the  Root-Mean-Square  (RMS) 
error.  Specihcally  the  results  show  that  as  the  probability  of  correctness  takes  on 
a  smaller  value  (decreases),  the  SNR  takes  on  a  smaller  value,  and  the  RMS  error 
increases.  Values  are  quantihed  for  each  of  the  tradeoffs. 

1.3  Contributions 

We  have  shown  that  we  could  cut  energy  demand  in  half  with  16-bit  Kogge-Stone 
adders  that  deviated  from  the  correct  value  by  an  average  of  3.0  percent  in  14  nm 
CMOS  FinFET  technology,  assuming  a  noise  amplitude  of  3  x  10  V^/Hz  (see  Fig. 
32).  This  was  achieved  by  reducing  to  0.6  V  instead  of  its  maximum  value  of 
0.8  V.  The  energy-delay  product  (EDP)  was  reduced  by  38  percent  (see  Fig.  33). 

Adders  that  got  wrong  answers  with  a  larger  deviation  of  about  7.5  percent  (using 
Vdd  =  0.5  V)  were  up  to  3.7  times  more  energy-efficient,  and  the  EDP  was  reduced 
by  45  percent. 

Adders  that  got  wrong  answers  with  a  larger  deviation  of  about  19  percent  (using 
VoD  =  0.3  V)  were  up  to  13  times  more  energy-efficient,  and  the  EDP  was  reduced 
by  35  percent. 

We  used  inexact  adders  and  inexact  multipliers  to  perform  the  color  space  trans¬ 
form,  and  found  that  with  a  1  percent  probability  of  error  at  each  logic  gate,  the 
letters  “F-16”,  which  are  14  pixels  tall,  and  “U.S.  AIR  FORCE”,  which  are  8  to  10 
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Figure  3.  Original  uncompressed  image  of  an  F-16,  file  name  4.2. 05. tiff,  from  the  USC 
SIPI  image  database  [25]. 

pixels  tall,  are  readable  in  the  processed  image,  as  shown  in  Fig.  40f,  where  the 
relative  RMS  error  is  5.4  percent. 

We  used  inexact  adders  and  inexact  multipliers  to  perform  the  discrete  cosine 
transform,  and  found  that  with  a  1  percent  probability  of  error  at  each  logic  gate, 
the  letters  “F-16”,  which  are  14  pixels  tall,  and  “U.S.  AIR  FORCE”,  which  are  8  to 
10  pixels  tall,  are  readable  in  the  processed  image,  as  shown  in  Fig.  41f,  where  the 
relative  RMS  error  is  20  percent. 

In  the  next  section,  we  present  a  literature  review  of  inexact  computing  and  de¬ 
scribe  prior  work.  This  section  provides  background  information  regarding  the  ap¬ 
proach  for  inexact  computing  that  is  used  in  this  dissertation. 
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II.  Literature  Review 


Inexactness  has  typically  been  understood  to  be  inherent  in  analog  circuits,  but 
not  in  the  conventional  understanding  of  digital  logic.  Previous  work  in  the  held  of 
inexact  digital  CMOS  has  focused  on  two  types  of  inexactness:  (1)  circuits  which 
produce  an  approximate  result,  but  are  deterministically  erroneous  by  design,  and 
(2)  circuits  which  suffer  from  the  effects  of  random  noise.  The  hrst  type  of  inexact¬ 
ness  is  achieved  via  probabilistic  pruning  [21]  or  probabilistic  logic  minimization  [26]. 
Probabilistic  pruning  is  a  bottom-up  approach  in  which  components  are  removed 
from  the  schematic  of  a  circuit,  for  the  purpose  of  saving  energy,  delay,  and  area, 
while  producing  output  which  is  “correct”  in  the  majority  of  cases.  Probabilistic 
logic  minimization  accomplishes  the  same  objective  of  saving  energy,  delay,  and  area 
by  creating  an  erroneous,  but  simpler,  design  based  on  a  modihed  truth  table  which  is 
“correct”  in  the  majority  of  cases.  Probabilistic  pruning  and  probabilistic  logic  min¬ 
imization  both  produce  an  approximate  result,  constrained  by  a  desired  error  bound 
[21,  26].  The  designer  chooses  the  error  bound  that  meets  the  needs  of  the  system. 
Probabilistic  pruning  and  probabilistic  logic  minimization  enable  the  designer  to  re¬ 
duce  energy  consumption,  delay,  and  chip  area  by  creating  a  circuit  which  is  simpler 
than  the  conventional  (exact)  circuit. 

From  the  perspective  of  this  dissertation,  inexact  computing  is  contrary  to  the 
notion  of  error  detection  and  correction.  Whereas  the  primary  goal  of  inexact  com¬ 
puting  is  to  reduce  energy  consumption  [27],  error  detection  and  correction  techniques 
contain  additional  components  which  increase  the  energy,  delay,  and  area  of  the  cir¬ 
cuit.  These  techniques  could  be  used  in  conjunction  with  inexact  computing,  but 
only  if  the  overall  energy  savings  outweigh  the  additional  costs.  For  example,  triple 
module  redundancy  could  be  used  if  the  energy  consumption  of  the  inexact  circuit  is 
less  than  Ys  the  energy  consumption  of  the  equivalent  exact  circuit. 
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The  second  type  of  inexactness,  which  is  non-deterniinistic  (noise-snsceptible) 
inexactness,  has  been  achieved  by  severely  lowering  the  power  snpply  voltage,  thus 
reducing  the  noise  margins  of  the  circuit  [28,  29,  30].  Great  energy  savings  can  be 
achieved  this  way,  if  the  designer  is  willing  to  tolerate  the  error.  For  example,  for  an 
inverter  in  0.25  pm  CMOS  technology  with  an  RMS  noise  magnitude  of  0.4  V,  [27] 
reports  a  300%  energy  reduction  per  switching  cycle  can  be  achieved  by  allowing  the 
probability  of  correctness  p  to  drop  from  0.99  to  0.95. 

Prior  work  investigated  energy  and  performance  with  the  use  of  high  level  C- 
based  simulations  [31,  32,  33].  These  papers  treat  individual  logic  gates  as  unreliable, 
with  an  associated  probability  of  correctness  p  <  1,  or  equivalently  a  finite  error 
probability  1  —  p,  and  present  simulation  results  of  complex  circuits  built  out  of 
unreliable  primitive  elements.  Additional  prior  work  used  circuit  simulations  of  a  32- 
bit  weighted  voltage-scaled  adder  with  carry  look-ahead  capability  and  demonstrated 
a  calculation  error  of  10“®  while  reducing  the  total  power  consumption  by  more  than 
40%  in  45  nm  CMOS  FDSOI  technology  [34].  This  shows  a  promising  approach 
to  the  study  of  inexact  adders,  which  we  have  used  in  our  work.  Additional  prior 
work  shows  a  four-fold  reduction  in  energy-delay  product  using  probabilistic  pruning 
of  64-bit  Kogge-Stone  adders,  at  the  expense  of  an  8%  average  error  [35].  In  this 
dissertation,  we  use  similar  metrics  to  evaluate  adders. 

In  some  cases,  erroneous  bits  inside  a  circuit  have  no  impact  on  its  final  output. 
Researchers  are  interested  in  “don’t  care  sets”  which  describe  sets  of  erroneous  inputs 
that  don’t  cause  errors  at  the  output  (“observability  don’t  care”  conditions),  or  input 
vectors  that  can  never  occur  ( “satishability  don’t  care”  conditions)  [16,  36].  This 
dissertation  does  not  take  that  approach;  however,  it  is  clearly  relevant  to  the  held 
of  inexact  computing.  For  example,  if  a  designer  is  aware  of  the  observability  don’t 
care  conditions  of  a  device,  then  he  could  simplify  the  circuit  using  probabilistic 
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logic  minimization  without  causing  any  errors  at  the  output.  As  another  example, 
random  noise  inside  a  digital  circuit  could  cause  errors  internal  to  the  circuit  without 
affecting  the  output;  based  on  the  observability  don’t  care  conditions,  the  designer 
could  choose  to  use  unreliable  components  in  those  areas  of  the  circuit  in  which  errors 
would  be  unlikely  to  affect  the  output.  On  the  other  hand,  while  satisfiability  don’t 
care  conditions  may  exist  for  an  error- free  logic  circuit,  they  may  not  exist  if  that 
circuit  is  susceptible  to  random  errors,  i.e.  random  noise  may  cause  unexpected  input 
conditions. 

The  approach  of  this  literature  review  is  informed  by  the  goal  of  this  dissertation. 
The  approach  of  this  work  is  to  investigate  the  energy  reduction  and  energy-delay 
product  in  a  selection  of  adder  and  multiplier  architectures  made  using  unreliable  logic 
gates,  and  then  to  build  an  inexact  JPEG  compression  algorithm  using  these  inexact 
adders  and  multipliers.  In  the  interests  of  high-speed  simulation  and  of  collecting 
large  sample  sizes,  these  simulations  were  performed  with  Matlab  using  a  Probabilistic 
Boolean  Logic  (PBL)  error  model. 

The  PBL  error  model  is  more  simplistic  than  an  analog  error  model.  In  this 
dissertation,  we  also  compute  the  energy  and  Energy-Delay  Product  (EDP)  of  selected 
adder  architectures  in  14  nm  FinFET  CMOS  technology  as  a  function  of  error.  This 
is  similar  to  the  approach  used  in  [34].  As  an  example  of  the  approach  presented  in 
this  dissertation,  consider  a  circuit  design  that  exhibits  a  20%  error  rate.  The  results 
in  this  dissertation  show  that  for  the  specific  circuits  considered  with  an  error  rate 
is  20%,  the  payback  for  accepting  a  20%  error  rate  is  an  energy  reduction  of  90%. 
If  a  circuit  designer  were  willing  to  tolerate  20%  error  and  gain  this  energy,  one  can 
then  ask  the  question,  ‘what  good  is  this’  ?  For  example,  one  could  implement  triple 
module  redundancy  in  either  temporally  or  spatially:  temporally,  one  could  sample 
the  same  data  at  three  different  points  in  time;  spatially,  one  could  replicate  the 
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circuit  design  three  times,  with  three  copies,  knowing  that  statistically  the  majority 
vote  will  always  be  correct.  The  main  point  is  that  when  one  can  improve  the  accuracy 
through  the  use  of  redundancy  and,  at  the  same  time,  save  so  much  energy  that  one 
obtains  practically  a  perfect  answer,  then  accepting  the  error  is  worth  it. 
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III.  Background 


In  this  section,  we  present  a  taxonomy  of  inexact  computing.  We  review  inte¬ 
ger  adder  architectures  and  show  how  the  model  of  PBL  can  be  applied  to  adders. 
We  explain  how  PBL  can  be  used  within  a  binary  or  an  analog  circuit  model,  and 
introduce  nomenclature  used  throughout  the  dissertation.  Next,  we  review  integer 
multiplier  architectures.  We  review  probability  distributions  and  Maximum  Likeli¬ 
hood  Estimation  (MLE),  and  explain  how  they  can  be  used  to  characterize  the  error 
distributions  of  inexact  adders  and  multipliers.  Finally,  we  provide  a  detailed  review 
of  the  JPEG  compression  algorithm. 

The  following  definitions  are  useful  for  understanding  the  framework  of  inexact 
computing: 

•  Inexact:  Probabilistic  methodology  for  determining  an  answer. 

•  Imprecision:  Small  uncertainties  about  the  LSBs  of  the  answer.  The  maximum 
possible  error  that  can  occur.  Imprecision  is  inherent  in  data  collected  from 
analog  sensors. 

•  Probability  of  Correctness:  Determines  the  dispersion  (i.e.  standard  deviation) 
of  the  errors. 

•  Erroneous:  Failure  of  the  system  to  compute  a  useful  answer. 

3.1  Taxonomy  of  Inexact  Computing 

3.1.1  Deterministic. 

Many  different  sources  of  deterministic  error  are  possible.  Error  can  be  measured 
in  many  different  ways.  The  amount  of  error  to  tolerate  is  a  design  parameter  for  the 
inexact  system.  Error  sources  include: 
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•  Limited  precision  of  the  processor  (number  of  bits) 

•  Inexact  design  techniques,  such  as  probabilistic  pruning  or  probabilistic  logic 
minimization 

3. 1.1.1  Limited  Precision. 

Digital  computers  are  inherently  inexact,  because  they  have  limited  precision.  For 
example,  using  double-precision  (64-bit)  floating-point  numbers,  |  is  approximated 
as  0.166666666666666660.  If  we  add  that  approximation  to  itself  six  times,  the  result 
is  slightly  less  than  1.  For  practical  purposes,  data  being  analyzed  do  not  require 
64  bits  of  precision.  Furthermore,  analog  data  are  inherently  bounded  by  a  range  of 
uncertainty,  and  this  uncertainty  carries  into  the  digital  domain  when  analog  data 
are  digitized.  Data  collected  from  actual  experiments  do  not  have  infinite  precision; 
the  numbers  have  a  limited  number  of  “significant  hgures”  or  bits  of  information. 
Whereas  a  computer  may  store  a  piece  of  experimental  data  in  a  64-bit  register,  only 
the  first  eight  bits  (for  example)  may  contain  meaningful  information  based  on  the 
experiment. 

The  limits  of  the  information  content  carry  forward  from  the  source  data  into  other 
data  computed  from  it.  If  we  have  an  iVA-bit  number  A  which  contains  iV/(A)  bits  of 
information,  and  an  iV^-bit  number  B  which  contains  to  Nk^b)  bits  of  information, 
then  the  product  AB  has  Na  +  Nb  bits,  but  only  iV7(P)  bits  of  information,  where 
iV/(P)  is  the  lesser  of  Nk^a)  and  Nk^b)-  The  information  content  of  the  sum  A  +  B  is 
limited  by  the  greater  of  LSBb^a)  and  LSBj(^b),  where  LSB^^a)  is  the  least  signihcant 
bit  of  A  containing  meaningful  information,  and  LSBk^b)  is  the  least  signihcant  bit 
of  B  containing  meaningful  information. 

The  point  of  this  is  that  we  do  not  need  to  build  a  64-bit  adder  if  an  8-bit  adder  can 
handle  the  information  content.  Obviously,  adders  and  multipliers  of  64-bit  numbers 
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occupy  more  area  on  chip,  have  longer  delay  time,  and  consume  more  energy  than 
adders  and  multipliers  of  8-bit  numbers  (all  else  being  equal).  Since  the  objectives  of 
inexact  computing  include  savings  of  energy,  delay,  and  area,  eliminating  unneeded 
precision  is  consistent  with  the  philosophy  of  inexact  computing.  Details  about  this 
technique  applied  to  the  JPEG  compression  algorithm  are  explained  in  Section  4.6. 

3. 1.1. 2  Probabilistic  Pruning. 

Probabilistic  pruning  is  a  bottom-up,  architecture-level  approach  to  inexact  com¬ 
puting  [21,  35].  It  is  accomplished  by  taking  an  exact  design,  and  then  deleting  those 
logic  gates  which  are  least  used  and  have  the  least  impact  on  the  overall  accuracy 
of  the  system.  Deleting  components  from  the  design,  like  pruning  leaves  off  a  tree, 
reduces  the  delay,  area,  and  power  consumption  of  the  system.  The  result  is  a  com¬ 
puter  which  is  inaccurate  by  design,  but  only  in  a  limited  number  of  cases  which 
occur  infrequently.  To  decide  which  components  to  prune,  the  designer  looks  at  each 
element  in  the  circuit  and  considers:  (1)  the  probability  of  the  element  being  active 
at  any  given  time,  and  (2)  the  magnitude  of  the  error  that  would  result  from  deleting 
that  circuit  element.  After  pruning  a  component,  the  designer  then  “heals”  the  float¬ 
ing  inputs  of  the  remaining  elements  that  were  previously  connected  to  the  pruned 
element,  as  described  in  Section  3. 1.1. 2. 3.  The  designer  continues  pruning  the  circuit 
until  the  error  of  the  pruned  circuit  exceeds  a  desired  error  bound  {Smax)- 
3. 1.1. 2.1  Probability  of  an  Element  Being  Active. 

The  probability  of  a  logic  element  being  active  depends  on  the  application,  and 
is  determined  by  analysis,  modeling,  or  simulation.  Specihcally,  the  designer  must 
predict  the  probability  of  every  possible  input  vector  to  the  circuit.  This  depends  on 
the  expected  data  set.  Then  the  designer  can  determine  the  accuracy  penalty  which 
would  result  from  pruning  a  circuit  element. 
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3. 1.1. 2. 2  Quantifying  Error. 

The  designer  chooses  an  error  metric  depending  on  the  application.  Lingamneni 
[21]  dehnes  three  possible  metrics:  average  error,  error  rate,  and  relative  error  mag¬ 
nitude.  Additionally,  these  error  metrics  can  be  weighted  or  uniform  (unweighted). 
In  the  unweighted  case,  all  bits  have  equal  weight  when  calculating  error.  In  the 
weighted  case,  the  jth  bit  of  a  binary  number  is  assigned  a  weight  factor  rjj  equal  to 

2h 

Average  error:  The  average  error  of  a  pruned  circuit  Q'  relative  to  an  exact  circuit 
Q  is  computed  as  [21]: 

V 

Er(^')  =  X  |Ffc  -  Yfcl  <  £max  (2) 

k=l 

where: 


V  =  the  number  of  possible  input  vectors;  or  else  the  number  sampled 

<  Tfc  1,  Ifc  2,  •  •  •  yk,n  >  =  output  vector  of  exact  circuit 

<  Tfc,!,  •  •  •  Yk,n  >  =  output  vector  of  pruned  circuit 
n  =  number  of  bits 

Pk  =  probability  of  occurrence  for  each  input  vector 
£max  =  desired  error  bound 
Pj  =  2^  =  weight  factor  of  the  jth  bit  of  Y).  and 

or  pj  =  1  for  all  j  (unweighted  error  model) 

Error  rate: 


Error  Rate 


Number  of  Erroneous  Computations 
Total  Number  of  Computations 

V 


(3) 
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Relative  error  magnitude-. 


Relative  Error  Magnitude 


1  ^\Yk-  Yk 


(4) 


By  pruning  away  circuit  elements  which  are  seldom  active,  or  which  have  little  ef¬ 
fect  on  the  hnal  output,  the  errors  from  Equations  (3)-(4)  will  be  small.  To  obtain 
maximum  savings  in  energy,  delay,  and  area,  the  designer  will  continue  pruning  until 
the  average  error  reaches  the  desired  error  bound  {Emax)-  To  predict  error  rates  for 
complex  circuits  it  may  not  be  practical  to  compute  the  errors  across  the  entire  input 
space;  therefore,  a  random  sample  may  be  chosen. 

3. 1.1. 2. 3  Healing. 

After  pruning  away  a  circuit  element,  the  inputs  of  some  of  the  remaining  circuit 
elements  will  be  floating  and  undehned.  The  designer  can  heal  each  floating  input 
in  one  of  three  ways:  (1)  connect  them  to  ground,  (2)  connect  them  to  the  supply 
voltage  Vdd,  or  (3)  connect  them  to  one  of  the  inputs  of  the  pruned  element.  The 
best  choice  is  whichever  one  minimizes  the  error. 


3. 1.1. 3  Probabilistic  Logic  Minimization. 

Probabilistic  logic  minimization  is  a  top-down,  architecture  level  approach  to  in¬ 
exact  design  [26].  In  this  method,  the  designer  looks  at  the  truth  table  of  an  exact 
circuit,  and  then  considers  flipping  bits  in  ways  that  make  the  logic  simpler.  This 
results  in  a  circuit  which  is  occasionally  erroneous,  but  has  less  delay,  area,  and  power 
consumption  than  the  exact  circuit.  As  in  the  probabilistic  pruning  method,  the  goal 
is  for  the  errors  to  be  infrequent  and  small  in  magnitude.  There  is  no  hardware 
overhead  to  this  technique. 

As  an  example,  consider  the  carry-out  function  of  a  one-bit  full  adder.  The  exact 
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logic  is 


Cout  =  ab  +  be  +  ac. 


(5) 


The  truth  table  for  this  function  has  eight  possible  outputs.  Equation  (5)  can  be 
approximated  by 


^out  ^b  T  c 


(6) 


or 


Cout  =  a{b  +  c) 


(7) 


or 


Cout  =  ab  +  be  +  ac  +  abe. 


(8) 


in  each  case,  the  truth  table  is  incorrect  in  one  of  the  eight  positions.  However,  while 
Equations  (6)-(7)  are  simpler  than  (5),  Equation  (8)  is  more  complicated.  Therefore, 
(6)-(7)  are  favorable  bit  flips  and  are  good  candidates  for  probabilistic  logic  minimiza¬ 
tion,  while  (8)  is  unfavorable  and  would  not  be  used.  The  errors  due  to  probabilistic 
logic  minimization  can  be  quantihed  using  Equations  (2-4). 

3. 1.1. 4  Don’t  Care  Sets. 

Probabilistic  pruning  and  probabilistic  logic  minimization  each  create  approximate 
solutions  to  digital  logic  problems.  For  some  input  vectors  there  will  be  zero  error  at 
the  output,  and  in  other  cases  there  will  be  some  error.  There  is  considerable  research 
[16]  regarding  the  identiheation  of  “don’t  care”  sets  of  input  or  output  vectors: 

•  A  satishability  don’t  care  condition  is  an  input  vector  that  can  never  occur.  For 
example,  if  a  digital  circuit  performs  a  function  /(a,  b)  on  a  binary  inputs  a  and 
b,  and  a  =  b  OR  c,  then  (a,  b)  =  (0, 1)  is  an  impossible  input  vector  to  /. 

•  An  observability  don’t  care  condition  occurs  when  changes  to  the  input  vector 
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do  not  change  the  output  vector.  For  a  digital  vector  function  /  of  input  vector 
X,  observability  don’t  care  means  df{X)/dX  =  0. 

Don’t  care  analysis  could  be  used  to  identify  all  the  zero-error  conditions  resulting 
from  probabilistic  pruning  or  probabilistic  logic  minimization.  However,  inexact  com¬ 
puting  extends  this  idea  by  allowing  a  nonzero  error  distribution  to  be  introduced  into 
the  digital  system. 

3. 1.1. 5  Mutual  Information:  The  Usefulness  of  the  Estimator. 

A  skeptic  will  ask:  Given  that  an  inexact  signal  is  wrong,  how  is  it  any  better 
than  random  noise?  Or  why  not  just  output  zeros  all  the  time?  The  answer  is  that 
the  inexact  signal  contains  more  information  than  a  random  signal  or  a  zero.  Using 
a  concept  called  mutual  information,  we  can  quantify  the  usefulness  of  an  inexact 
signal  relative  to  an  exact  signal. 

From  information  theory,  we  define  the  entropy  H  as  the  average  uncertainty  of  a 
random  variable  [37].  For  a  discrete  random  variable  X,  entropy  can  be  expressed  as 
the  number  of  binary  digits  required  to  quantify  any  possible  outcome  of  X.  Entropy 
is  calculated  as 

-  log2  p{x),  (9) 

all  X 

where  p{x)  is  the  probability  mass  function  of  the  discrete  random  variable  X.  \i  X 
is  uniformly  distributed  over  2^  possible  values,  then  H{X)  =  N  bits,  and  within  the 
framework  of  information  theory,  N  is  not  necessarily  an  integer. 

The  joint  entropy  H{X,  Y)  of  two  discrete  random  variables  X  and  Y  is 

H{X,Y)  =  -  EE  p{x,  y)  log2  p{x,y),  (10) 

all  X  all  y 

where  p{x,y)  is  the  joint  Probability  Mass  Function  (PMF)  of  X  and  Y.  The  condi- 
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tional  entropy  of  Y  given  X  is 


h{y\x)  =  -J2p(^)Y,  p{y\x)  log2  p{y\x),  (11) 

all  X  all  y 

where  p{y\x)  is  the  conditional  PMF  of  Y  given  X  =  x. 

The  mutual  information  I  of  two  random  variables  can  be  dehned  as  the  reduction 
in  the  uncertainty  of  one  variable,  given  knowledge  of  the  other.  For  discrete  random 
variables,  /  is  calculated  as 


nx,Y) 


H(Y)  -  H{Y\X) 


EE  p{x,y)  log2 

all  X  all  y 


p(x,y) 

p(x)p(y) 


(12) 

(13) 


where  p{y)  is  the  PMF  of  Y .  Now  Y  provides  as  much  information  about  X  as  X 
does  about  F,  so 


/(X,F)  =  /(F,X) 


(14) 


=  H{X)-H{X\Y).  (15) 


As  an  example,  suppose  X  is  a  random  variable  drawn  from  a  population  of  inte¬ 
gers  uniformly  distributed  between  1  and  100.  Then  the  entropy  H{X)  =  log2  100  = 
6.644  bits.  Now  suppose  F  is  a  random  variable  drawn  from  a  population  of  integers 
such  that  0  <  (?/  —  x)  <  3  for  all  x  =  X  and  all  y  =  Y,  and  (F  —  X)  is  uniformly 
distributed  between  0  and  3.  Then  {y  —  x)  can  be  any  of  four  possible  values.  If 
we  know  x,  then  it  takes  log2  4  =  2  bits  of  additional  information  to  determine  ?/,  so 
the  conditional  entropy  H{Y\X)  =  2  bits.  It  then  follows  from  (12)  that  the  mutual 
information  I{X,Y)  =  6.644  —  2  =  4.644  bits. 

We  apply  the  theory  of  mutual  information  to  the  exact  output  Q,  given  the  esti- 
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mator  Q'  computed  by  the  inexact  circuit.  The  mutual  information  I{Q,Q')  provides 
a  useful  metric  of  the  quality  of  Q’  as  an  estimator.  The  larger  the  value  of  /,  the  bet¬ 
ter  Q'  is  as  an  estimator.  Note  that  the  value  of  I  is  based  on  an  assumed  probability 
distribution  of  Q. 

3.1.2  Non-Deterministic. 

Non-deterministic  error  means  error  caused  by  random  variables  unknown  to  the 
circuit  designer.  In  conventional  circuit  design,  the  goal  is  to  eliminate  the  uncertain¬ 
ties  caused  by  random  errors.  However,  with  inexact  computing,  non-deterministic 
error  sources  may  be  tolerated.  These  sources  include: 

•  Thermal  noise 

•  Shot  noise 

•  Flicker  (1//)  noise 

•  Radio  Frequency  (RF)  interference 

•  Crosstalk  within  the  chip 

•  Manufacturing  process  variations 

•  Radiation-induced  single  event  upsets 

•  Ionizing  radiation  effects 

3. 1.2.1  Analog  Systems. 

There  is  no  such  thing  as  an  error- free  analog  circuit.  Any  analog  circuit  can 
be  thought  of  as  a  hardware  implementation  of  a  mathematical  function  y  =  f{x), 
where  the  input  vector  x  and  output  vector  y  are  both  functions  of  time  t.  The  input 
nodes  of  the  circuit  are  corrupted  by  a  noise  function  Win(t),  and  the  output  nodes 
are  corrupted  by  noise  Woutif)-  By  superposition  the  analog  function  becomes 

y{t)  =  f  [x{t)  +  Winit)]  -t-  Wout{t).  (16) 


Basic  building  blocks  of  analog  circuit  include: 


•  Amplifier  with  gain  A: 

y(t)  =  A  ■  [x(t)  +  Winit)]  +  Woutit) 

•  Adder  (summing  junction)  with  n  inputs: 

y{t)  =  Yh=1  [Xiit)  +  Win,i{t)]  +  Wout{t) 

•  Differentiator: 

y{t)  =  ^  [x(t)  +  Win{t)\  +  Wout{t) 

•  Integrator  with  start  time  Iq. 

dr  +  Wout{t) 

In  the  case  of  the  integrator,  the  slightest  nonzero  bias  in  Win  causes  error  to  con¬ 
tinuously  accumulate  with  time.  In  all  the  above  cases,  cascading  stages  of  analog 
circuits  introduces  additional  error  with  each  stage. 

By  contrast,  in  traditional  computer  science,  digital  logic  circuits  are  considered  to 
be  perfectly  deterministic,  error-free  computing  machines.  Traditional  Boolean  logic 
does  not  consider  the  possibility  of  random  errors  in  the  system.  Inexact  computing 
expands  the  notion  of  digital  logic  to  allow  errors  or  approximations  to  be  introduced 
into  the  digital  system. 

3. 1.2. 2  Binary  Logic  Affected  by  Noise. 

Schematics  for  a  CMOS  inverter  with  noise  at  its  input  and  at  its  output  are  shown 
in  Fig.  4.  We  expect  the  output  of  an  inverter  to  simply  be  the  binary  complement  of 
its  input.  However,  when  an  inverter  is  degraded  by  random  noise,  its  output  appears 
as  shown  in  Fig.  5.  This  approach  can  be  applied  to  more  complex  circuits  such  as 
adders,  which  is  the  method  used  in  this  dissertation. 
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(a)  Noise  at  input.  (b)  Noise  at  output. 

Figure  4.  CMOS  inverter  with  (a)  additive  white  Gaussian  noise  (AWGN)  at  the  input, 
and  (b)  AWGN  at  the  output. 


Figure  5.  Digitized  input  waveform  A  and  output  waveform  F  of  a  noisy  inverter, 
showing  a  clean  input  and  a  noisy  output.  In  this  example,  the  noise  was  input-coupled 
as  shown  in  Fig.  4a. 
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3. 1.2. 3  Probabilistic  Boolean  Logic. 


3. 1.2. 3.1  Definitions. 

Chakrapani  defines  probabilistic  Boolean  operators,  similar  to  standard  Boolean 
operators,  except  with  a  certain  probability  of  correctness  [38,  28,  27]: 

Vp  disjunction  (OR)  with  probability  p, 

Aq  conjunction  (AND)  with  probability  q,  and 
-<r  negation  (NOT)  with  probability  r, 
where  ^  <  p,q,r  <  1. 

The  probabilistic  equality  operator  is  denoted  as: 

=  is  equal  to,  with  a  probability  s, 
where  ^  <  s  <  1. 

The  probabilistic  AND,  OR,  and  NOT  operations  described  above  are  very  useful 
for  analyzing  complex  circuits  such  as  adders  and  multipliers,  as  described  in  Sections 
4.3-4.4. 

3. 1.2. 3. 2  Identities. 

In  standard  Boolean  logic,  there  are  several  identities  which  can  be  extended  into 
probabilistic  Boolean  logic  [38]: 

1.  Commutativity 

2.  Double  Complementation 

3.  Operations  with  0  and  1 

4.  Identity 

5.  Tautology 

6.  DeMorgan’s  identities 
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3. 1.2. 3. 3  Identities  which  are  Not  Preserved. 


Not  all  classical  Boolean  identities  can  be  extended  into  probabilistic  Boolean 
logic.  For  binary  variables  a,  b,  and  c,  and  probabilities  pi  ■  ■  .p5,  the  following  iden¬ 
tities  are  not  preserved  [38]: 

1.  Associativity:  (a  (6  Vp^  c))  ^  ((a  Vpj  b)  Vpj  c). 

2.  Distributivity:  (a  Vp^  (6  Ap^  c))  ^  ((a  Vpg  b)  Ap^  (a  Vpg  c)). 

3.  Absorption:  (a  Ap^  (a  Vpj  6))  ^  a. 

3. 1.2. 4  Probability  of  Correctness  in  the  Presence  of  Random  Noise. 

Consider  a  pulsed  waveform  x  which  varies  with  time  t.  If  we  add  random  noise, 
then  the  noisy  waveform  x  can  be  modeled  as 

X{t)  =  X{t)  +  Win{t)  (17) 

where  Win  is  Additive  White  Gaussian  Noise  (AWGN),  expressed  as 

Win  (t')  -^w  COS  “1“  'ipw,x')  (i-®) 

where  the  amplitude  frequency  ujw,x,  and  phase  are  constant  within  the 

short  sampling  window  of  interest,  but  otherwise  are  random  variables  distributed  as 

~  A/”  (o,  al) 

Unif 

Unif  (0,  27t)  , 
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(19) 

(20) 
(21) 


where  ul  and  uju  are  the  upper  and  lower  limits  of  the  signal  bandwidth,  and  ay^ 
is  the  standard  deviation  of  the  noise  amplitude.  In  this  dissertation,  the  notation 
~  A/'(/i,cr^)  means  “is  normally  distributed  with  mean  p  and  variance  cr^”,  and  ~ 
Unif(a,  b)  means  “is  uniformly  distributed  between  a  and  6”.  In  the  case  of  thermal 
noise,  the  mean  thermal  energy  is 

&  =  ifeT  =  \CMn(t)),  (22) 

where  /cb  is  Boltzmann’s  constant,  T  is  the  absolute  temperature,  Cx  is  the  capaci¬ 
tance  of  the  circuit  node,  and  {w^j^{t))  is  the  mean  of  the  square  of  the  thermal  noise 
voltage  over  time  [29].  Since  Win  is  normally  distributed  in  amplitude,  then  the  av¬ 
erage  (Winit))  is  exponentially  distributed  with  mean  cr^  [39],  which  implies  a  rate 
parameter  cr“^.  This  exponential  distribution  is  denoted  as 

~  Exponential  .  (23) 

Noise  sources  can  create  errors  in  digital  circuits.  These  effects  can  be  simulated 
using  Simulation  Program  with  Integrated  Circuit  Emphasis  (SPICE)  or  Spectre"'"'^ 
software  tools,  as  described  in  Section  4.1. 

3.2  Adders 

A  large  part  of  the  JPEG  image  compression  algorithm,  as  described  in  Section 
3.6,  consists  of  addition  and  multiplication;  and  multiplication  is  built  upon  addition. 
A  digital  adder  computes  the  sum  of  two  A-bit  integers  A  and  B  to  produce  an 
A-bit  output  “sum”  S  and  a  carry-out  bit  Cout-  The  simplest  A-bit  adder,  the 
Ripple-Carry  Adder  (RCA),  is  composed  of  A  one-bit  adders  in  parallel,  with  a  carry 
signal  connecting  each  bit  to  the  one  above  it.  In  many  cases,  the  sum  of  A  and  B 
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will  require  +  1  bits  to  store,  and  therefore  S  is  not  always  equal  to  A  +  B.  In  this 
research,  the  sum  of  A  and  B  is  called  the  augmented  sum  5'+,  where 

S+  =  A  +  B  (24) 


and 


=  2^  Cout  + 

(25) 

The  ith  bits  of  A,  B,  and  S  are  each  written  as  Oj, 

and  Si  respectively,  where 

N-l 

A  = 

(26) 

i=0 

N-l 

B  =  2*6j,  and 

(27) 

i=0 

N-l 

s  = 

(28) 

i=0 

For  most  applications,  the  ripple-carry  adder  is  the  slowest  adder  architecture. 
There  are  many  adders  which  are  optimized  for  speed,  for  example:  carry-lookahead, 
Brent-Kung,  Han-Carlson,  and  Kogge-Stone  adders  [40].  Previous  work  [35]  has 
applied  inexactness  to  these  more  sophisticated  adders.  In  this  research,  we  will  use 
these  types  of  inexact  adders  in  the  image  compression  algorithm. 

3.2.1  1-Bit  Full  Adder. 

The  one-bit  Full  Adder  (FA)  is  the  basic  building  block  of  a  digital  adder.  It  takes 
three  binary  inpnts  a,  b,  and  (where  is  known  as  the  carry-in)  and  produces 
binary  outputs  s  (the  sum)  and  Cout  (the  carry-out)  according  to  the  truth  table  in 
Table  1.  The  circuit  diagram  for  a  one-bit  full  adder  is  shown  in  Figure  6. 
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Table  1.  Truth  Table  for  a 
1-Bit  Pull  Adder 


a 

b 

Qn 

^out 

s 

0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

0 

1 

0 

0 

1 

0 

1 

1 

1 

0 

1 

0 

0 

0 

1 

1 

0 

1 

1 

0 

1 

1 

0 

1 

0 

1 

1 

1 

1 

1 

Table  2.  Truth  Table  for  a 
1-Bit  Half  Adder 


a 

h 

^out 

s 

0 

0 

0 

0 

0 

1 

0 

1 

1 

0 

0 

1 

1 

1 

1 

0 

3.2.2  1-Bit  Half  Adder. 


The  one-bit  Half  Adder  (HA)  is  a  simplified  version  of  the  full  adder.  The  half 
adder  is  used  when  there  is  no  carry-in  bit — for  example,  in  the  lowest-order  bit  of 
an  A-bit  adder.  The  truth  table  for  the  one-bit  half  adder  is  shown  in  Table  2,  and 
the  circuit  diagram  is  shown  in  Figure  7. 


s 


Cout 


Figure  7.  1-bit  half  adder 


3.2.3  A-bit  Ripple-Carry  Adder. 

An  A-bit  ripple-carry  adder  consists  of  A  one-bit  adders  in  parallel,  where  the 
ith  carry-out  bit  q  becomes  the  carry-in  bit  to  the  {i  +  l)th  column  of  the  adder. 
The  schematic  for  an  A-bit  ripple-carry  adder  is  shown  in  Figure  8. 
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Figure  8.  iV-bit  ripple-carry  adder 

3.2.4  Propagate/Generate  Logic. 

A  carry  lookahead  adder  contains  logic  designed  to  quickly  compute  the  higher- 
value  carry  bits,  so  the  adder  does  not  have  to  wait  for  the  carry  to  ripple  from  the 
lowest  to  the  highest  bit,  as  is  the  case  with  the  ripple-carry  adder.  This  logic  is  called 
propagate  and  generate  logic.  A  generate  condition  exists  if  a  column  (or  group  of 
columns)  generates  a  carry.  A  propagate  condition  exists  if  column(s)  propagate  a 
carry  which  was  generated  by  a  lower-value  column.  The  propagate  condition  Pj.j  for 
the  jth  stage  of  the  Ah  column  is  determined  by 

Pi:j  =  Pi-.k  A  Pk-l-.j,  (29) 

where  k  represents  a  previous  stage,  and  A  is  the  logical  AND  operator.  The  generate 
condition  Gi-j  is  determined  by 

Gi:j  =  (Pi-.k  A  Gk-l-.j)  V  (30) 
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Figure  9.  Propagate/ Generate  logic  gates  [40]. 


where  V  is  the  logical  OR  operator.  The  schematic  symbols  for  these  expressions  are 
illustrated  in  Fig.  9. 

3.2.5  Ripple-Carry  Adder. 

A  ripple-carry  adder  is  the  simplest  digital  adder,  requiring  fewer  components 


and  less  area  than  any  other  adder.  It  is  also  the  slowest  type  of  adder,  because  it 


computes  every  column  of  the  sum  one  at  a  time.  All  other  adders  have  additional 
components  designed  to  speed  up  computation  by  computing  the  higher-order  bits 
in  parallel  with  the  lower-order  bits.  The  schematic  for  a  16-bit  ripple-carry  adder  is 
shown  in  Fig.  10. 

3.2.6  Carry  Lookahead  Adder. 

A  carry  lookahead  adder  splits  the  addition  problem  into  subgroups  of  bits,  and  has 
additional  hardware,  called  the  carry  lookahead  logic,  which  enables  fast  computation 
of  the  higher-order  bits.  The  number  of  bits  in  a  subgroup  is  called  the  valency  or 
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Figure  10.  16-bit  ripple-carry  adder  schematic  [40]. 


Pl5;0  Pl4:0  Pl3:0  Pl2:0  Pll:0  Pl0;0  PgiO  PbiO  PtiO  P6:0  P5:0  P4:0  P3:0  P2:0  Pi:0  Po;0 


Figure  11.  16-bit  radix  4  carry  lookahead  adder  schematic  [40]. 


radix  of  the  adder.  For  a  radix  4  carry  lookahead  adder, 

Gi-.j  =  Gi±  +  Pi-.k  {Gk-l:l  +  Pk-l:l  {Gl-l:m  +  Pl-l:mGm-l-.j))  (31) 


and 


Pi:i  Pi-.kPk—ldPl—l'.mPm— 


^■31 


(32) 


as  represented  by  the  black  circles  in  the  bottom  left  of  Fig.  9  and  the  top  of  Fig.  11 
[40].  The  schematic  for  a  16-bit  radix  4  carry  lookahead  adder  is  shown  in  Fig.  11. 
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Figure  12.  16-bit  Kogge-Stone  adder  schematic  [40]. 

3.2.7  Kogge-Stone  Adder. 

The  Kogge-Stone  adder  is  a  type  of  carry  lookahead  adder.  In  an  A-bit  Kogge- 
Stone  adder,  the  carry  lookahead  logic  has  log2  N  stages.  The  schematic  for  a  16-bit 
Kogge-Stone  adder  is  shown  in  Fig.  12.  The  inputs  to  the  hrst  stage,  Pr,o  and  Gr,o, 
are  determined  by 


Pi:0  —  ©  © 

(33) 

Gi:0  =  ©  A  5j, 

(34) 

where  a*  and  6*  are  the  ith  bits  of  the  adder  inputs  A  and  B,  and  ©  is  the  exclusive-OR 
operator.  The  Ah  sum  bit  Sj  is  determined  by 

Pi'.O  ®  (35) 
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3.2.8  Ling  Adder. 


A  Ling  adder  [41]  uses  pseudogeneate  and  pseudopropagate  signals  Hi,j  and  li-j 
in  place  of  the  regular  propagate  and  generate  signals  Gi-,j  and  Pij  in  (29)- (30)  and 
Fig.  9.  The  Ling  adder  used  in  the  IBM  Power4  microprocessor  is  a  radix  4  carry 
lookahead  adder  [42],  Any  adder  that  uses  propagate/generate  logic  can  use  the  Ling 
technique.  The  only  differences  are  in  the  precomputation  of  Hi-o  and  Ii:o  at  the  top 
of  the  schematic,  and  the  computation  of  the  hnal  sum  bits  at  the  bottom.  The  initial 
pseudogenerate  is  computed  as  [40] 


(36) 


and  the  initial  pseudopropagate  is 


(37) 


li-.O  —  ®i:0  +  bi-.Q. 


The  advantage  of  this  is  that  it  replaces  an  exclusive-OR  with  an  OR  on  the  critical 


path,  which  makes  the  algorithm  faster.  Then,  instead  of  using  (31)  to  compute  the 
carry  lookahead  logic,  we  use  the  simpler  expression 


(38) 


where  Ki,j  =  Jj+i-j+i,  and  instead  of  (32)  we  use 


(39) 


The  hnal  sum  bit  Si  is  computed  as 


Si  —  Hi-i-Q  [Pi:o  ©  +  Hi-i-oPi-Q. 


(40) 
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3.2.9  Probabilistic  Boolean  Logic  (PBL). 


In  this  research,  we  use  PBL  to  analyze  complex  circuits  such  as  adders.  The 
probabilistic  AND,  OR,  and  NOT  operations  described  in  Section  3. 1.2. 3  are  very 
useful  for  this  purpose,  as  described  in  Sections  3.2.10-3.2.11.  Although  [38]  does  not 
define  a  probabilistic  exclusive-or  (XOR),  for  the  purpose  of  this  research  we  define 
it  as 

CL  ©p  b  =  (yd  Ai  Vp  (“'i®  Ai  6)  (41) 

for  binary  numbers  a  and  b,  where  ©p  is  the  XOR  operation  with  probability  p  of 
correctness. 

3.2.10  Propagate/Generate  Logic  with  PBL. 

From  the  perspective  of  Probabilistic  Boolean  Logic,  (29)-(30)  can  be  modified  as 
follows: 


Pi:j  — 

> 

> 

1 

5.0. 

(42) 

G.-,  = 

(Pi:k  Gk—l:j)  Vp  Gi:k-) 

(43) 

where  Pij  and  are  noisy  approximations  for  Pj.j  and  Gi^j.  In  this  analysis,  the 
AND-OR-21  (A021)  gate  in  Fig.  9  is  regarded  as  a  single  entity,  and  for  this  reason 
the  probability  p  appears  only  once  in  (43). 


3.2.11  Kogge-Stone  Adder  with  PBL. 

PBL  can  also  be  applied  to  the  input  of  the  Kogge-Stone  adder: 


Pi:0  CLi  ©p  b^ 

Gi:Q  di  Ap 
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(44) 

(45) 


Pn-1 

Tp-^-.T 

A/-bit 

Multiplier 

Cell 

VtT. 

bN-1  A 


Pk 


bk  bk  bk  bk 


Pk-1,0 


P3 


Figure  13.  A^-bit  integer  multiplier,  (a)  The  A:th  stage,  not  including  stage  fc  =  0.  *For 
stage  fc  =  1,  a  half  adder  can  be  used  for  the  highest-order  bit.  (b)  All  {N  —  1)  stages 
cascaded  together. 


and  to  the  sum  bits: 


Pi'.O  ©p  ^iAog2N^ 

where  s*  is  a  noisy  approximation  for  Sj. 


(46) 


3.3  Multipliers 

A  multiplier  computes  the  product  P  of  two  A^-bit  integers  A  and  B.  For  an 
integer  multiplier,  P  may  require  up  to  2N  bits  to  store.  In  a  basic  multiplier,  the 
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product  is  computed  as 


N-l 

P  =  2^P^_,J2Pk,  (47) 

k=l 

where  Pk  is  the  kth  partial  product,  and  pk  is  the  fcth  bit  of  P,  and 

Pk  =  Pk-1,0,  (48) 

where  pk-ip  is  the  zeroth  bit  of  -Pfc-i-  The  zeroth  partial  product  Pq  is  computed  as 

N-l 

Pq  =  ^0  2*aj,  (49) 

i=0 

and  the  remaining  (N  —  1)  partial  products  Pk  are  computed  according  to 

N-l 

-Pfc  =  ^  {ttibk  +  Pk-i,i+i) ,  (50) 

i=0 

where  pk-i,i+i  is  the  (i  +  l)th  bit  of  so 

N 

Pk  =  Y,‘^"m-  (51) 

i=0 

By  Equation  (51),  Pk  requires  {N  +  1)  bits,  and  from  (47)  it  is  apparent  that  the 
hnal  product  P  of  an  exact  integer  multiplier  requires  2N  bits.  The  schematic  for  the 
integer  multiplier  is  shown  in  Figure  13. 

3.4  Probability  Distributions 

If  we  treat  the  inputs  A  and  B  to  an  inexact  adder  as  random  variables,  it  follows 
that  S'"*",  S'"*",  and  e  are  also  random  variables.  In  this  research,  we  £t  the  normalized 
error  e  to  various  hypothetical  probability  distributions. 
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3.4.1  Gaussian  Distribution. 


A  Gaussian  (normal)  distribution  is  characterized  by  its  Probability  Density  Func¬ 
tion  (PDF) 


jlcp- 

fx{x)  =  ^-= -  (52) 

V27rcr 

with  respect  to  some  random  variable  X,  where  p  is  the  mean  and  a  is  the  standard 
deviation.  Given  a  sample  of  data,  the  most  likely  Gaussian  distribution  to  £t  the 
sample  has  parameters  /x  =  p  and  a  =  a,  where  fi  is  the  sample  mean  and  a  is  the 
sample  standard  deviation. 


3.4.2  Laplacian  Distribution. 

A  Laplacian  (double  exponential)  distribution  has  the  PDF 

fx{x)  =  (53) 

with  scale  parameter  a  and  standard  deviation  \/2/a  [39].  Given  a  sample  x„  of  data, 
consisting  of  n  observations  Xi,X2,  ■  ■  ■  Xn,  the  most  likely  Laplacian  distribution  to  £t 
the  sample  has  parameter 

i=i 

where  a  is  an  estimate  of  a,  and  is  the  sample  median  [43]. 

3.4.3  Normal  Product  Distribution. 

A  Normal  Product  (NP)  distribution  arises  from  the  product  u  oi  ip  normally- 
distributed  random  variables  Xi,  X2, . . .  X^,  where  ip  is  the  order  of  the  distribution, 
and  the  product  is 

h 

u  =  Xj.  (55) 

i=i 
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Normal 

Laplace 


(a)  Linear  scale. 


(b)  Semilogarithmic  scale. 


Figure  14.  Probability  density  functions  for  Gaussian  (with  a  =  1),  Laplacian  (with 
a  =  1),  NP2,  NP3,  and  NP20  (each  with  a  =  1). 
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We  introduce  the  abbreviation  NP-^  which  means  “-^th-order  normal  product  dis¬ 
tribution”  .  Random  variables  Xi,  X2, . . .  have  standard  deviations  cxi,  (72, . . .  a,/, 
respectively,  and  the  product  a  =  (Ti(T2  ■  ■  ■  UiIj  is  the  standard  deviation  of  u.  A 
Gaussian  distribution  is  a  special  case  of  a  normal  product  distribution,  where  '0  =  1. 

Nadarajah  [44]  gives  the  exact  formula  for  the  PDF  of  a  normal  product  distri¬ 
bution  as  the  more  general  case  of  the  product  of  -0  Student’s  t  distributions,  where 
z/j  is  the  number  of  degrees  of  freedom  of  the  ith  t  distribution,  and  z/j  — >  cx)  for  all 
i  ^  ii,  i2,  ■  ■  ■  i^- 


fu{u) 


uT  {uij2)  P  {uij2)  ■  ■  ■  P 

/ 

v 


^,'ip 

■ 

^0 

1  1 

2’  2’  • 

1  \ 

•  2 

V? 

2  ’  2  ’  * 

0 

••  2  y 

(56) 


for  M  >  0,  where  0  is  the  total  number  of  non-inhnite  values  of  z/,,  P  is  the  gamma 
function,  and  G  is  the  Meijer  G-function.  In  this  research,  we  are  interested  in  the 
product  of  Gaussian  distributions,  in  which  case  0  =  0  and  (56)  simplihes  to 


fu{u) 


u 


1  1 
2’  2’ • • 


(57) 


Since  the  distribution  is  symmetric  about  u  =  0,  we  can  use  |m|  in  place  of  u.  Applying 
the  scale  factor  a,  the  generic  formula  for  a  normal  product  distribution  is 


fu(,u) 


27r-^/2 

\u\ 


^0,1/1 


1  1 
2’  2’  ■  • 


(58) 


Fig.  14  compares  the  PDFs  of  Gaussian,  Laplacian,  NP2,  NP3,  and  NP20  distri¬ 
butions.  All  are  symmetric  about  the  central  peak.  PDFs  for  all  NP  distributions 
with  0  >  2  diverge  to  inhnity  as  x  approaches  zero.  The  order  0  of  an  NP  distri- 
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bution  is  important,  because  as  increases,  so  does  the  kurtosis  of  the  distribution. 
Kurtosis  can  be  interpreted  as  the  narrowness  of  the  peak  of  the  PDF,  or  alternatively 
as  the  heaviness  of  the  tails.  This  is  evident  in  the  NP20  distribution  in  Fig.  14b,  as 
compared  with  the  NP2  and  NP3  distributions. 

3.4.4  Maximum  Likelihood  Estimation. 

In  order  to  determine  which  probability  distribution  is  the  best  fit  for  a  sample 
containing  n  observations  of  data,  the  maximum  likelihood  estimation  method  is 
used.  The  likelihood  I  that  a  sample  represents  a  distribution  with  PDF  fx  is 


n 


=  Yl/x{xj\0), 


(59) 


where  6  are  the  parameters  of  the  distribution.  It  is  usually  more  convenient  to  work 
with  the  log-likelihood  L  of  the  distribution,  because  this  allows  us  to  work  with  sums 
instead  of  products.  The  log-likelihood  is 


n 


(60) 


The  parameters  6  which  maximize  /,  or  equivalently,  L,  specify  the  best  fit  for  any 
given  fx-  Furthermore,  by  the  maximum  likelihood  estimation  method  it  is  possible 
to  compare  among  different  distributions,  for  example,  it  can  determine  whether  a 
Gaussian,  Laplacian,  or  normal  product  distribution  is  the  best  fit  for  the  sample.  To 
maximize  L,  the  derivative  of  L  with  respect  to  6  is  set  to  zero: 


(61) 


and  the  value  of  9  which  satishes  (61)  specifies  the  best  fit  [39]. 
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According  to  (58),  each  normal  product  distribution  PDF  for  continuous- valued 
random  variables  diverges  to  infinity  as  U  approaches  zero.  This  makes  it  impossible 
to  perform  a  maximum  likelihood  estimation  if  the  data  sample  contains  zeros.  This 
problem  can  be  avoided  by  treating  f/  as  a  discrete-valued  random  variable  with  bin 
width  di,  where 


Assuming  a  real  discrete-valued  U,  there  is  a  finite  probability  that  U  is  equal  to 
some  discrete  value  u: 


(63) 


Now  Pjj  can  be  used  in  place  of  fu  in  (60)  and  the  maximum  likelihood  estimation 
can  be  performed. 

3.5  IEEE  754  Floating  Point  Storage 

The  JPEG  compression  algorithm  requires  floating  point  operations  in  order  to 
perform  the  Discrete  Cosine  Transformation  (DCT).  For  floating  point  numbers,  the 
IEEE  754  standard  is  the  most  commonly  used  method  of  storage  [45].  According  to 
the  standard,  numbers  can  be  stored  within  16,  32,  64,  or  128-bit  words,  and  can  be 
stored  in  base  b  =  2  or  b  =  10  format.  Each  floating  point  number  contains  three 
components:  a  sign  bit  s,  an  exponent  e,  and  a  mantissa  m,  where  in  this  notation 
1  <  m  <  2  .  Accordingly,  for  any  floating  point  number  A, 


A  =  (-l)®b' 


(64) 
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Table  3.  IEEE  754  Standard  Base-2  Formats 


Name 

Common  Name 

b 

eo 

w 

Nm 

binarylb 

Half  precision 

2 

15 

5 

10+1 

binary32 

Single  precision 

2 

127 

8 

23+1 

binary64 

Double  precision 

2 

1023 

11 

52+1 

binaryl28 

Quadruple  precision 

2 

16383 

15 

112+1 

where  Co  is  the  offset  bias  of  the  exponent,  in  accordance  with  Table  3.  Each  format 
inclndes  a  iVe-bit  exponent  and  a  iVn^-bit  mantissa,  where  the  valnes  for  and 
are  shown  in  Table  3.  For  any  nonzero  binary  nnmber,  the  first  digit  of  m  is  always 
1,  and  is  omitted.  Therefore,  only  —  1)  bits  are  stored. 


3.5.1  Floating  Point  Addition. 

In  order  to  add  two  nnmbers  in  IEEE  754  format,  they  must  first  have  the  same 
value  for  the  exponent  c.  To  obtain  this,  the  mantissa  of  the  smaller  number  is  shifted 
right,  and  its  exponent  incremented,  until  the  exponents  of  the  two  numbers  match. 
Trailing  bits  of  the  mantissa  are  truncated.  When  the  two  exponents  match,  then  the 
mantissas  can  be  added.  The  carry-out  bit  Cout  of  this  addition  is  then  added  to  the 
exponent.  To  find  the  sum  S  of  two  floating  point  numbers  A  and  B,  where  B  > 


ms  =  ms  +  (mA  »  (cs  -  e^))  -  2Cout  (65) 

es  =  es  +  Cout,  (66) 

where  Ca,  ts,  and  ts  are  the  exponents  of  A,  5,  and  S;  mA,  tris,  and  ms  are  the 
mantissas  of  A,  B,  and  S,  and  >>  denotes  bitwise  shifting  by  (cs  —  Ca)  bits. 

3.5.2  Floating  Point  Mnltiplication. 

Multiplication  of  two  numbers  A  and  B  in  IEEE  754  format  consists  of  multiplying 
the  mantissas  mA  and  m^  together,  and  then  adding  the  exponents.  The  leading 
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bits  of  the  mantissa  are  preserved;  the  rest  are  truncated.  To  hnd  the  product  P, 


rup  =  m^tup 

(67) 

(68) 

where  rup  is  the  mantissa  of  P,  and  ep  is  the  exponent  of  P. 

3.6  JPEG  Compression  Algorithm 

The  Joint  Photographic  Experts  Group  (JPEG)  compression  algorithm,  also  known 
as  the  JPEG  File  Interchange  Format  (JFIF)  compression  algorithm,  consists  of  the 
following  steps  [46,  47]: 

1.  Golor  Space  Transformation  (GST) 

2.  Tiling 

3.  Discrete  Gosine  Transformation  (DGT) 

4.  Quantization 

5.  Zigzagging 

6.  Run-amplitude  encoding,  and 

7.  Huffman  encoding. 

These  steps  are  illustrated  in  the  block  diagram  in  Fig.  2. 

3.6.1  Color  Space  Transformation. 

First,  the  image  is  converted  from  RGB  format  to  YGbGr  format.  The  reason  for 
this  is  that  the  Human  Visual  System  (HVS)  is  more  sensitive  to  intensity  than  it 
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is  to  color  [48].  By  converting  to  YCbCr  format,  we  can  optimize  the  quality  of  the 
intensity  (luminance)  3^  at  the  expense  of  the  colors  (chrominance)  Cb  and  Cr-  The 
conversion  is  as  follows  [49]: 


3^  = 

0.29977  + 0.587^ +  0.114i3; 

(69) 

Cb  = 

-0.1687477  -  0.33126^  +  0.5i3  +  128; 

(70) 

Cr  = 

0.577  -  0.41869^  -  0.08131i3  +  128. 

(71) 

Each  component  is  processed  independently  of  the  other  components. 

3.6.2  Tiling. 

Each  component  (3^,  Cb,  and  Cr)  is  then  arranged  into  8x8  tiles  of  pixels.  Each 
tile  is  processed  independently  of  the  other  tiles.  The  color  components  are  typically 
sampled  at  only  half  the  rate  of  the  intensity  component  in  the  y  direction;  that  is, 
the  vertical  resolution  of  the  color  components  is  half  the  resolution  of  the  luminance 
component. 

3.6.3  Discrete  Cosine  Transformation. 

Next,  the  two-dimensional  DCT  transform  is  performed  on  each  tile: 

C  =  UXU^,  (72) 

where  X  is  the  8x8  tile  of  intensity  or  color  data,  U  is  the  orthogonal  DCT  transform 
matrix,  and  C  is  an  8  x  8  matrix  of  frequency  components  in  the  horizontal  and  vertical 
directions.  The  zero-frequency  component  is  in  the  upper  left  corner  of  the  matrix, 
and  is  known  as  the  dc  component;  the  others  are  called  the  ac  components.  Assuming 
the  image  is  a  “smooth”  function  of  x  and  y,  most  of  the  DCT  components  will  be 
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close  to  zero. 


A  single  8x8  DCT  block  produces  images  as  shown  in  Fig.  15.  Each  subhgure 
shows  the  effect  of  one  of  the  64  DCT  components  being  active,  with  a  value  of 
1023,  while  all  the  other  components  are  zero.  Not  all  elements  are  illustrated  in  this 
figure — there  are  64  of  them.  As  we  will  see  in  Chapter  V,  if  the  processor  erroneously 
computes  a  DCT  value  which  is  too  large,  we  will  see  artifacts  which  look  like  the 
pictures  in  Fig.  15. 

In  this  research,  we  simulate  probabilistic  Boolean  logic  by  randomly  hipping 
bits  inside  each  adder  and  each  multiplier.  Matrix  multiplication  is  built  from  these 
inexact  adders  and  multipliers.  The  DCT  is  built  from  matrix  multiplication.  Errors 
in  the  DCT  result  in  artifacts  like  those  shown  in  Fig.  15.  Multiple  erroneous  DCT 
artifacts  may  be  superimposed  onto  each  other,  as  well  as  onto  the  desired  data.  An 
example  of  erroneous  DCT  data  is  shown  in  Fig.  41f. 


(a)  (1,1)  (b)  (1,2) 


] 

(f)  (2,1) 


n 

(g)  (6,1) 


(h)  (2,2)  (i)  (2,5)  (j)  (3,3)  (k)  (6,3)  (1)  (4,4)  (m)  (7,6)  (n)  (8,8) 


Figure  15.  Elementary  8x8  JPEG  images,  showing  the  result  of  a  single  DCT.  In  each 
subfigure,  63  of  the  64  values  in  the  DCT  matrix  are  zeros,  except  for  one  value  which 
is  1023.  The  row  and  column  number  of  the  nonzero  element  is  shown  in  each  caption. 


In  general,  finding  the  product  of  three  8x8  matrices  requires  1,024  multiply 
operations  and  896  addition  operations.  However,  due  to  the  sparseness  of  the  in¬ 
formation  in  U,  it  is  possible  to  simplify  the  computational  complexity  of  the  DCT. 
Table  4  shows  the  complexities  of  various  DCT  algorithms.  Despite  these  improve¬ 
ments,  computation  of  the  DCT  typically  takes  about  45  percent  of  the  processing 
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Table  4.  Complexity  of  Various  DCT  Algorithms  for  an  8x8  Input  Block  [48,  47] 


Algorithm 

Multiplications  Additions 

Reference 

Block  factors 

464 

144 

[50] 

2-D  Fast  Fourier  Transform  (EFT) 

104 

474 

[51] 

Walsh-Hadamard  Transform  (WHT) 

depends  on 

WHT  used 

[52] 

1-D  Chen 

256 

416 

[53] 

1-D  Lee 

192 

464 

[54] 

1-D  Loeffler,  Ligtenberg 

176 

464 

[55] 

2-D  Kamangar,  Rao 

128 

430 

[56] 

2-D  Cho,  Lee 

96 

466 

[57] 

1-D  Winograd 

80 

464 

[58] 

time  for  the  JPEG  compression. 


3.6.4  Quantization. 


The  DCT  matrix  C  is  then  converted  to  an  integer  matrix  Q  via  quantization. 
Matrix  C  is  divided  element-wise  by  a  quantization  matrix  Z  and  a  quantization  scale 
factor  a: 


round 


(73) 


where  qj,  and  qij  are  the  («,j)th  elements  of  G,  Z,  and  Q  respectively.  A  “quality 
factor”  q,  which  is  a  percentage,  is  often  specihed  in  lieu  of  a.  The  scale  factor  a  is 
calculated  from  q  as 


q  <  50% 


a  = 


(74) 


2  -  sk,  q  >  50%, 

In  Equation  (73),  each  qij  is  rounded  to  the  nearest  integer.  Some  information  is 
lost  at  this  point.  The  quantization  matrices  are  customizable  and  are  saved  within 
the  JPEG  hie;  however,  standard  quantization  matrices  are  commonly  used.  The 
intensity  (E)  and  chrominance  {Cb  and  Cr)  components  use  different  quantization 
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matrices.  For  the  intensity  component,  the  standard  quantization  matrix  is 


16 
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10 

16 

24 

40 

51 

61 

12 

12 

14 

19 

26 

58 

60 

55 

14 

13 

16 

24 

40 

57 

69 

56 

14 

17 

22 

29 

51 

87 

80 

62 

18 

22 

37 

56 

68 

109 

103 

77 

24 

35 

55 

64 

81 

104 

113 

92 

49 

64 

78 

87 

103 

121 

120 

101 

72 

92 

95 

98 

112 

100 

103 

99 

(for  intensity),  (75) 


and  for  the  chrominance  components,  the  standard  quantization  matrix  is 


Z  = 


17 

18 

24 

47 

99 

99 

99 

99 

18 

21 

26 

66 

99 

99 

99 

99 

24 

26 

56 

99 

99 

99 

99 

99 

47 

66 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

99 

(for  chrominance). 


(76) 


From  Equation  (74),  a  quality  factor  q  =  50%  results  in  a  scale  factor  a  =  1,  and  in 
that  case  the  unsealed  quantization  matrices  in  Equations  (75)-(76)  are  used. 


3.6.5  Zigzagging  of  Q. 

Matrix  Q  is  then  arranged  into  a  64-element  sequence  Q,  beginning  with  the  dc 
component,  and  zigzagging  diagonally  from  the  upper  left  to  the  lower  right  of  Q. 
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3.6.6  Run-Amplitude  Encoding. 


Run- amplitude  encoding  is  performed  on  all  ac  components  of  Q.  These  are 
converted  to  ordered  pairs  of  integers,  where  the  hrst  number  in  the  pair  is  the  run 
length  of  zeros  (that  is,  the  number  of  consecutive  zeros  in  sequence),  and  the  second 
number  is  the  nonzero  component  that  follows.  The  last  nonzero  component  of  Q  is 
followed  by  an  End  of  Block  (EOB)  code. 

3.6.7  Huffman  Encoding. 

Finally,  Huffman  encoding  is  performed  on  the  dc  component  of  Q  and  the  run- 
amplitude  encoded  ac  components.  The  Huffman  codes  vary  in  bit  length.  The 
value  associated  with  each  Huffman  code  is  saved  in  a  table  at  the  beginning  of 
the  JPEG  hie.  Optimally,  the  most  frequently  occurring  Huffman  codes  have  the 
shortest  bit  lengths.  Usually,  four  separate  Huffman  tables  are  used:  dc  luminance, 
ac  luminance,  dc  chrominance,  and  ac  chrominance.  These  tables  are  customizable; 
however,  standard  Huffman  tables  are  commonly  used.  Compression  occurs  during 
the  quantization,  run-amplitude  encoding,  and  Huffman  encoding  steps. 

3.6.8  Summary. 

The  GST,  DCT,  and  quantization  involve  linear  operations  (addition  and  multi¬ 
plication)  which  are  promising  applications  for  the  energy-parsimonious  inexact  com¬ 
puting  described  in  literature  [27] .  Several  methods  exist  for  optimizing  the  computa¬ 
tional  efficiency  of  the  discrete  cosine  transformation  [46,  47] — these  could  be  further 
improved  via  inexact  computing. 

Decoding  is  accomplished  by  performing  the  above  seven  steps  in  reverse  order. 
Since  losses  occur  during  quantization,  the  decoded  image  will  not  be  exactly  the 
same  as  the  original.  Also,  color  information  is  lost  due  to  the  chroma  subsampling 
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described  in  Section  3.6.2. 


Key  to  the  JPEG  compression  performance  is  the  run-amplitude  encoding  (Section 
3.6.6),  which  takes  advantage  of  the  fact  that  natural  images  vary  slowly  with  respect 
to  X  and  y,  have  few  nonzero  ac  components,  and  have  long  sequences  of  ac  com¬ 
ponents  which  are  zero.  Inexact  computing  technology  must  take  this  into  account. 
Nonzero  values  interjected  into  the  quantized  matrix  Q  will  degrade  the  compression 
performance  of  the  system.  Single  event  upsets  can  also  cause  this  effect. 

Since  Huffman  encoding  (Section  3.6.7)  and  decoding  involve  variable  bit  lengths, 
any  error  in  the  code  could  corrupt  all  the  data  that  follow.  Therefore,  the  Huffman 
coding  is  probably  not  an  application  for  inexact  computing.  In  a  radiation  environ¬ 
ment,  the  Huffman  codes,  like  everything  else,  are  susceptible  to  single  event  upsets, 
and  could  cause  such  data  corruption.  Due  to  the  critical  nature  of  the  Huffman 
codes,  the  Huffman  coding  algorithm  is  a  good  application  for  hardware  or  software 
redundancy,  error  checking,  or  radiation  hardening. 
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IV.  Methodology 


Our  approach  to  characterizing  inexact  adders  consists  of  a  binary  probabilistic 
Boolean  logic  (PBL)  simulation  implemented  using  Matlab.  To  compute  energy  and 
Energy-Delay  Product  (EDP),  we  implement  adder  circuits  in  a  Spectre"'"'^  analog 
simulation.  We  then  calculate  the  errors  of  the  inexact  adders,  £t  them  to  probability 
distributions,  and  compute  summary  statistics. 

4.1  Circuit  Simulations 

Noisy  analog  circuits  can  be  simulated  in  SPICE  or  Cadence  Spectre"*"^  software 
via  noisy  voltage  sources,  current  sources,  resistors,  or  transistors.  In  the  case  of 
voltage  or  current  sources,  noise  is  specihed  in  terms  of  its  Power  Spectral  Density 
(PSD)  (T^,  measured  in  volts  squared  per  hertz  (V^/Hz)  or  amps  squared  per  hertz 
(A^/Hz).  Thermal  noise,  shot  noise,  and  flicker  noise  of  resistors  and  transistors  can 
also  be  simulated.  In  all  cases,  the  user  must  specify  the  noise  bandwidth  for  the 
simulation.  Noise  can  also  be  simulated  in  Matlab  by  the  addition  of  a  normally 
distributed  random  voltage  with  a  mean  of  zero  and  a  standard  deviation  a. 

A  generalized  model  for  computing  the  energy  consumption,  delay,  and  probability 
of  correctness  of  an  inexact  computational  circuit,  as  compared  to  a  conventional 
(exact)  circuit,  is  shown  in  Figure  16.  Using  a  SPICE  or  Spectre™  environment, 
a  Monte  Carlo  simulation  can  be  performed,  using  a  set  X  of  randomly  generated 
digital  input  signals  which  switch  at  a  specihed  clock  rate.  This  input  vector  X  varies 
with  time  t,  and  is  common  to  both  the  exact  and  the  inexact  circuit.  The  two  circuits 
are  powered  by  separate  voltage  sources  with  magnitude  Vdd  for  the  exact  circuit, 
and  Vdd  for  the  inexact  circuit,  where  Vdd  <  Vdd-  When  the  simulation  runs,  the 
exact  output  signal  Y  and  inexact  output  Y  can  be  then  observed  and  compared. 
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Figure  16.  Generalized  circuit  model  for  simulating  the  correctness,  delay,  and  energy 
consumption  of  an  inexact  device,  as  compared  with  an  exact  device. 


The  difference  between  the  two  outputs  is  the  error  £,  which  is  a  function  of  X: 


E  =  y  \x(t)]  -  y  |x(()] . 


(77) 


If  the  inexact  circuit  is  noise-susceptible,  then  Equation  77  becomes 

e  =  Y[X{t)Mt)]-y[m].  (78) 

where  w  represents  the  vector  of  all  noise  sources  within  the  inexact  circuit. 

The  simulation  also  predicts  the  power  supply  currents  Idd  oi  the  exact  circuit 
and  Idd  oi  the  inexact  circuit.  Using  this  information,  the  instantaneous  power  V  of 
the  exact  circuit  can  be  computed  as 


V{t)  —  Vdd  iDoit), 


(79) 
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and  for  the  inexact  circuit  the  power  is 


V{t)  —  Vdd  iDuif)- 


(80) 


The  energy  consumption  8  of  the  exact  circuit  is 


(81) 


where  T  is  the  clock  period,  and  for  the  inexact  circuit  the  energy  is 


£{t)  =TV{t). 


(82) 


In  digital  circuits,  the  total  power  V  equals  the  sum  of  the  dynamic  power  Vd  and 
static  power  Vs-  Dynamic  power  depends  on  the  present  input  state  Xi,  and  also  on 
the  previous  state  X/_i.  Static  power  depends  only  on  the  present  input  state,  and  is 
generally  much  less  than  dynamic  power.  In  this  research,  we  are  only  interested  in 
the  average  powers  Vd  and  Vg  for  exact  circuits,  and  average  powers  Vd  and  Vg  for 
inexact  circuits.  While  Vg  and  Vg  are  functions  of  all  possible  inputs  W,  Vd  and  Vd 
are  functions  of  all  possible  Xi  paired  with  all  possible  W-i-  For  complex  circuits, 
it  is  impractical  to  test  the  entire  input  space.  For  this  reason,  we  choose  a  random 
sequence  of  inputs  and  perform  a  Monte  Carlo  simulation. 

4.1.1  Spectre^^  Simulation. 

4. 1.1.1  0.6  /im  Technology. 

In  order  to  quantify  the  energy  consumption  and  delay  of  the  inexact  8-bit  Kogge- 
Stone  adder,  a  Monte  Carlo  simulation  of  the  circuit  was  conducted  in  the  Cadence 
Spectre  environment,  using  C5N  0.6  pm  technology  by  ON  Semiconductor.  Logic 
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Table  5.  Probabilities  of  Correctness  Per  Node  due  to  Noise  Sources: 
0.6  fim  Technology 


Vdd[V] 

Noise  #1 

5  X  10-1°  VVHz 

Noise  #2 

1  X  10-°  VVHz 

1.5 

0.9343 

0.8603 

2.0 

0.9773 

0.9216 

2.5 

0.9942 

0.9633 

3.0 

1.0000 

0.9831 

3.3 

1.0000 

0.9899 

gates  were  designed  for  minimum  width,  with  equal  pull-up  and  pull-down  strength, 
and  a  maximum  fanout  of  4.  Random  binary  signals  {oq,  . . .  cin-i,  bo, . . .  bjy_i}  with 
a  25%  activity  factor  and  a  50  MHz  clock  rate  were  used  as  inputs  to  the  adder. 
The  performance  of  the  adder  was  observed  for  100  cycles  (2  /rs).  Random  errors 
were  introduced  within  the  system  by  placing  a  Gaussian  noise  voltage  source  at  each 
node  in  the  circuit.  This  effectively  simulated  the  probability  p  of  correctness  used 
in  (42)-(46).  The  noise  sources  caused  each  node  to  vary  from  its  “correct”  voltage 
by  a  random  amount.  The  probability  of  correctness  was  the  probability  that  the 
noise  would  not  exceed  1/d£)/2  at  any  given  point  in  time.  Noise  is  specihed  in  terms 
of  its  bandwidth  and  Power  Spectral  Density  (PSD).  The  0.6  pm  simulation  used  a 
500  MHz  noise  bandwidth  and  two  different  PSDs:  5  x  10“^°  and  1  x  10“®  V^/Hz, 
which  produced  the  per-node  probabilities  of  correctness  shown  in  Table  5.  It  is 
worth  noting  that  the  noise  voltages  in  Spectre™  are  roughly  Gaussian,  but  were 
not  seen  to  produce  values  beyond  ±3.05  standard  deviations.  It  is  also  worth  noting 
that,  although  p  =  1  appears  twice  in  Table  5,  this  is  not  necessarily  an  error- free 
condition,  because  the  noise  could  cause  additional  delay  during  state  transitions,  or 
accumulations  of  noise  from  multiple  sources  could  cause  errors. 
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Table  6.  Probabilities  of  Correctness  Per  Node  due  to  Noise  Sources: 
14  run  Technology 


Vdd[V] 

Noise  #1 

1  X  10-^1  VVHz 

Noise  #2 

2  X  10-^1  VVHz 

0.2 

0.6726 

0.6241 

0.3 

0.7488 

0.6824 

0.4 

0.8145 

0.7365 

0.5 

0.8682 

0.7854 

0.6 

0.9101 

0.8286 

0.7 

0.9412 

0.8658 

0.8 

0.9632 

0.8970 

4. 1.1. 2  14  nm  Technology. 

Spectre™  simulations  of  8  and  16-bit  ripple-carry  adders  were  conducted  using 
a  14  nm  finFET  technology.  Logic  gates  were  simulated  using  all  minimum-width 
transistors  with  one  hnger  and  two  hns.  Simulations  were  run  with  a  500  MHz  clock 
rate,  over  a  time  span  of  100  cycles  (200  ns).  To  simulate  the  noise  at  each  circuit  node, 
two  different  noise  states  were  used:  1  x  10“^^  and  2  x  10“^^  V^/Hz,  which  produced 
the  per- node  probabilities  of  correctness  shown  in  Table  6.  The  noise  bandwidth  in 
each  case  was  5  GHz. 


4. 1.1. 3  Energy  Per  Cycle. 

The  instantaneous  power  consumption  P[t)  of  the  adder  is  found  by  multiplying 
the  power  supply  current  iDoif)  by  the  power  supply  voltage  Vdd'- 


P(t)  —  Vdd  ^DDit).  (83) 


The  average  power  consumed  between  times  H  and  ^2  is 

Pava  =  -^  rp{t)dt,  (84) 

t2  —  tl  Jti 
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and  the  average  energy  per  clock  cycle  is 


p 

rp  _  p  7^  _ 

^avg  ^  avg^  j>  ; 

where  T  is  the  clock  period  and  /  is  the  clock  frequency. 

4. 1.1. 4  Delay  and  Error. 

To  measure  the  normalized  error  e  of  the  simulated  adder,  the  exact  instantaneous 
augmented  sum  S^{t),  as  a  function  of  time  t,  was  computed  from  the  input  signals 
according  to  (24)-(25),  and  the  inexact  instantaneous  augmented  sum  S~^{t)  was 
computed  from  the  output  signals.  The  instantaneous  error  £  is  computed  as 

e{X,  w)  =  S+{X,  w)  -  S+{X),  (86) 

where  X  =  {A,B}  is  the  input  vector  for  a  two-input  adder,  w  is  the  vector  of  noise 
sources  inside  the  adder,  and  S~^  is  the  approximate  augmented  sum  computed  by 
the  inexact  adder.  In  the  analog  simulations,  each  noise  source  is  an  additive  white 
Gaussian  noise  (AWGN)  source.  For  integer  adders,  it  is  helpful  to  normalize  the 
error  to  its  maximum  possible  value.  The  normalized  error  is  then 

d  =  — ,  (87) 

^max 

where  for  an  TV-bit  integer  adder  the  maximum  possible  error  is 

=  (2^^  -  1)  +  (2^^  -  1)  =  2^+^  -  2.  (88) 

The  instantaneous  error  i{t)  was  computed  from  (86)-(88),  and  then  interpolated 
along  a  uniform  spacing  of  t.  Gomputing  error  this  way  is  not  simple,  however.  Even 
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a  noise-free  adder  experiences  a  time  delay  6  between  the  inputs  and  outputs,  and 
that  delay  is  variable,  depending  on  which  input  caused  the  change  at  the  output.  In 
an  8-bit  Kogge-Stone  adder,  the  shortest  delay  5min  is  from  inputs  {oq,  feo}  to  output 
So,  while  the  longest  delay  5 max  is  along  the  “critical”  path,  which  could  be  from  any 
of  the  inputs  to  outputs  S5,  se,  or  S7.  This  is  evident  in  Fig.  12.  In  this  figure,  we  can 
see  that  only  one  gate  delay  is  required  to  compute  Sq  =  Po:0  =  Oo  ©  &o-  To  compute 
S7,  however,  one  gate  delay  is  required  to  compute  P6:0  =  Q-e  ©  ^6  and  Gq.q  =  aebQ, 
and  then  four  more  gate  delays  to  compute  S7. 

To  determine  the  range  of  possible  delays,  a  noise-free  adder  was  simulated  using 
power  supply  voltages  Vdd  =  1-5,  2.0,  2.5,  3.0,  and  3.3  V.  For  a  given  Vdd,  on  each 
clock  pulse,  the  time  span  between  the  minimum  and  maximum  possible  delay  is 
considered  an  indeterminate  state.  For  the  purpose  of  error  calculation,  the  domain 
of  i{t)  was  restricted  to  exclude  the  indeterminate  states.  Statistics  of  the  remaining 
observations  of  i  could  then  be  calculated.  Metrics  for  the  inexact  adder  are:  mean 
and  RMS  error,  maximum  delay  Smax,  average  energy  Eavg,  and  the  energy-delay 
product,  calculated  as 

EBP  =  Eavg  Smax-  (89) 

4.2  Probabilistic  Boolean  Logic  Simulations 

As  described  in  Section  3.2.9,  Chakrapani  [38]  defines  the  building  blocks  of  inex¬ 
act  digital  logic  circuits  in  terms  of  the  probability  of  correctness  p  of  each  individual 
logic  gate.  In  a  SPICE  or  Spectre"*"^  environment,  a  noisy  digital  logic  circuit  can 
be  simulated  as  described  in  Section  4.1,  and  at  its  output  we  can  observe  p  as  a 
function  of  a.  In  Matlab,  a  binary  circuit  node  with  probability  p  of  correctness 
can  be  simulated  by  comparing  p  with  a  uniformly  distributed  random  number,  and 
flipping  the  node  to  the  “wrong”  state  if  the  random  number  exceeds  p.  By  this 
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methodology,  it  is  necessary  to  simulate  every  node  in  the  circuit.  Therefore,  when 
building  a  complex  circuit,  it  is  necessary  to  have  a  schematic  of  the  proposed  circuit 
in  order  to  correctly  simulate  the  probabilities  of  correctness  of  the  hnal  outputs.  A 
basic  adder  is  shown  in  Figure  8,  and  a  basic  multiplier  in  Figure  13.  More  complex 
adders  are  commonly  used  in  practice,  and  will  be  developed  for  this  research. 

4.3  Inexact  Adders 


(a)  Two-input  AND  gate.  (b)  Two-input  OR  gate. 


A  A 


HC 

Hr 

-a 

b- 

A 

—  ab  +  c 


(c)  Two-input  XOR  gate.  (d)  AND-OR-2-1  gate. 

Figure  17.  Schematics  for  AND,  OR,  XOR,  and  AND-OR-2-1  gates. 


Many  different  types  of  simulations  can  be  performed  in  order  to  evaluate  the 
performance  of  inexact  adders: 

1.  Various  adder  architectures,  including  ripple-carry,  carry  lookahead,  Kogge- 
Stone  etc. 

2.  Various  types  of  inexactness,  including  probabilistic  pruning,  probabilistic  logic 
minimization,  probabilistic  Boolean  logic,  and  noise. 
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Figure  18.  Schematic  of  a  noisy  8-bit  ripple-carry  adder.  Additive  white  Gaussian 
noise  (AWGN)  sources  are  shown  at  the  output  of  each  AND,  XOR,  and  AND-OR-2-1 
gate. 
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Figure  19.  Schematic  of  a  noisy  8-bit  Kogge-Stone  adder.  Additive  white  Gaussian 
noise  (AWGN)  sources  are  shown  at  the  output  of  each  AND,  XOR,  and  AND-OR-2-1 
gate. 
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3.  Various  adder  sizes:  N  =  8,  16,  32,  or  64  bits. 

4.  Integer  and  floating  point  adders. 

5.  Various  distributions  of  the  inputs  A  and  B. 

6.  Varying  the  power  supply  voltage  Vdd- 

7.  Varying  amounts  of  the  noise  w  at  each  node  within  the  circuit. 

8.  Varying  the  probability  p  of  correctness  at  each  node  within  the  circuit. 

9.  Applying  more  energy  to  the  higher-order  bits,  in  order  to  decrease  the  error  e. 

Circuit  simulations  were  performed  with  Spectre™  as  described  in  Section  4.1, 
and  probabilistic  Boolean  logic  simulations  were  performed  with  Matlab,  as  described 
in  Section  4.2. 

Schematics  for  the  AND,  OR,  XOR,  and  AND-OR-2-1  gates  are  shown  in  Fig. 
17.  Positive  logic  (instead  of  N AND  and  NOR)  is  consistent  with  [[32],  Fig.  11].  The 
schematic  for  a  noisy  8-bit  ripple-carry  adder  is  shown  in  Fig.  18,  and  the  schematic 
for  a  noisy  8-bit  Kogge-Stone  adder  is  shown  in  Fig.  19.  In  these  figures,  a  noise 
source  is  at  the  output  of  each  logic  gate. 

Adders  were  evaluated  in  terms  of  energy  dissipation,  delay,  chip  area,  and  error, 
where  the  error  i  is  computed  from  (86)-(88).  In  the  Spectre™  analog  simulation 
environment,  the  noise  vector  w  is  a.  set  of  additive  white  Gaussian  noise  (AWGN) 
sources  located  at  each  node  within  the  circuit.  In  the  Matlab  probabilistic  Boolean 
logic  (PBL)  simulations,  the  noise  sources  were  discrete-valued,  binary  error  sources. 
Note  that  (87)-(88)  do  not  apply  to  floating-point  adders,  and  it  is  not  practical  to 
normalize  errors  relative  to  the  upper  limits  of  IEEE  754  floating-point  numbers. 
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4.3.1  Ripple-Carry  Adder  with  Inexactness  Only  on  Less-Significant 
Bits. 

To  limit  the  error  of  an  inexact  adder,  it  is  possible  to  design  it  snch  that  the 
most-significant  bits  are  computed  using  exact  (reliable)  technology,  and  the  less- 
significant  bits  are  computed  using  inexact  (unreliable)  technology.  For  an  iV-bit 
ripple-carry  adder,  which  is  composed  of  N  one-bit  adders,  the  lower  Ninexact  bits 
would  be  computed  using  inexact  one-bit  adders,  and  the  upper  N^xact  bits  would  be 
computed  using  exact  one-bit  adders,  where  N^xact  +  Ninexact  =  N .  The  benefit  of 
this  is  that  for  such  an  adder,  Smax  is  limited  to 

£-  _  ^^^inexact  _  _j_  ^^^inexact  _  ^  _  ‘^^inexact~\~^  _  2  ^90^ 

and  the  distribution  of  the  error  e  is  the  same  as  for  an  Aj^exacrbit  ripple-carry 
adder.  The  trade-off  is  that  more  energy  must  be  dissipated  in  order  to  compute 
the  upper  Nexact  bits.  The  operation  of  a  partially  inexact  8-bit  ripple-carry  adder 
with  Nexact  =  3  and  Nonexact  =  5  is  illustrated  in  Figure  20.  In  the  figure,  the  five 
least  significant  bits,  including  the  carries  which  ripple  upward,  are  computed  using 
inexact  one-bit  adders.  The  three  most  significant  bits,  and  the  carry-out  bit,  are 
computed  using  exact  one-bit  adders.  In  this  example,  it  is  possible  for  an  erroneous 
carry  bit  to  ripple  from  position  4  into  position  5;  for  this  reason,  it  is  still  possible 
for  the  uppermost  bits  to  be  erroneous.  Repeated  addition  of  such  data  will  cause 
further  accumulation  of  errors. 
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Figure  20.  Addition  using  a  partially  inexact  8-bit  ripple-carry  adder  with  Nexact  =  3 
and  Ninexact  =  5.  Inexact  addition  is  shown  in  red,  and  exact  addition  is  shown  in  green. 

4. 3. 1.1  Energy  Savings. 

We  assume  that  the  energy  consumed  per  cycle  by  an  exact  iV-bit  ripple-carry 
adder  is  proportional  to  N: 


(91) 


where  En,it_add,ex  is  the  energy  per  cycle  consumed  by  an  exact  one-bit  adder.  The 
energy  consumed  by  a  partially  inexact  adder  is 


(92) 


where  Eiut-add,in  is  the  energy  per  cycle  consumed  by  an  inexact  one-bit  adder.  The 
energy  savings  of  the  partially  inexact  adder  relative  to  the  exact  adder  is 


(93) 


4.4  Inexact  Mnltipliers 

A  methodology  similar  to  Section  4.3  was  applied  to  multipliers.  For  this  research, 
shift-and-add  and  Wallace  tree  architectures  were  studied.  For  a  multiplier,  the  error 
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is  computed  as 


e{X,  w)  =  P{X,  w)  -  P{X),  (94) 

where  P  is  the  exact  product  and  P  is  the  inexact  product.  For  integer  multipliers, 
the  normalized  error  e  is  computed  using  the  maximum  possible  error  Smax,  which  is 

emax  =  (2^  -  1)  •  (2^  -  1)  =  2^^^  -  2^^^^  +  1,  (95) 

where  Np  is  the  number  of  bits  of  the  output  product  of  the  multiplier,  and 

Np  =  Na  +  Np,  (96) 

where  Na  and  Np  are  the  bit  widths  of  the  multiplicands  A  and  B  respectively. 

4.4.1  Shift-and-Add  Multiplier  with  Inexactness  Only  on  Less-Significant 
Bits. 

To  limit  the  error  of  the  shift-and-add  multiplier,  exact  technology  can  be  used  to 
compute  the  most  signihcant  bits,  with  inexact  technology  used  for  the  less  signihcant 
bits.  The  schematic  for  a  shift-and-add  multiplier  is  shown  in  Figure  13.  In  this 
work,  we  simulate  an  iV-bit  partially  inexact  multiplier  as  follows.  We  are  given 
an  Wi-bit  multiplicand  A  and  an  iV^-bit  multiplicand  B,  where  N a  +  Np  <  N. 
The  uppermost  N^xact  bits  are  computed  exactly,  and  the  remaining  Ninexact  bits  are 
computed  inexactly,  where  N exact  +  Nonexact  =  N.  The  steps  involved  in  shift-and-add 
multiplication  are: 

1.  A  logical  AND  is  performed  between  each  bit  of  A  and  bit  0  (the  least  signihcant 
bit)  of  B.  The  AND  operation  is  inexact  for  bits  0  through  Ninexact  —  1;  for 
higher  bits,  the  AND  operation  is  exact.  The  resulting  value  is  called  Pq. 
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2.  A  logical  AND  is  performed  between  each  bit  of  A  and  bit  1  of  B.  The  AND 
operation  is  inexact  for  bits  1  through  N inexact  —  1;  for  higher  bits,  the  AND 
operation  is  exact. 

3.  The  value  from  step  2  is  shifted  one  place  to  the  left. 

4.  The  value  from  step  3  is  added  to  the  value  from  step  1.  In  this  work,  the 
addition  is  performed  by  an  A^^-bit  ripple-carry  adder  with  a  half-adder  on  the 
LSB.  The  addition  is  inexact  for  bits  1  through  Ninexact  —  1;  for  higher  bits,  the 
addition  is  exact.  No  operation  is  required  on  the  LSB  of  Pq;  it  is  simply  routed 
into  the  bit  0  position  of  the  sum.  The  result  is  an  {Na  +  2)-bit  value  Pi. 

5.  Steps  2  through  4  repeat  for  each  of  the  bits  in  B,  with  the  number  of  shifts 
in  step  3  incrementing  with  each  iteration.  However,  inexact  AND  and  inexact 
addition  are  performed  only  on  bit  positions  Ninexact  —  1  and  below. 

An  example  of  partially  inexact  multiplication  is  illustrated  in  Figure  21.  In  this 
hgure,  Na  =  4,  Nb  =  4,  Nexact  =  3,  and  Ninexact  =  5.  The  hgure  illustrates  inexact 
computation  performed  on  the  lower  hve  bits  of  the  product. 

4. 4. 1.1  Energy  Savings. 

We  now  derive  the  formula  for  the  energy  savings  of  an  inexact  multiplier.  The 
energy  per  cycle  consumed  by  an  exact  shift-and-add  multiplier  can  be  computed  as 

Emult,ex  ^A^BEand,ex  ^ —add, EX')  (97) 

where  Band, ex  is  the  energy  per  cycle  consumed  by  an  exact  AND  gate,  and  Eibit-add,ex 
is  the  energy  per  cycle  consumed  by  an  exact  one-bit  adder.  For  this  research,  Eq. 
(97)  provides  the  baseline  against  which  the  energy  cosumption  of  inexact  multipliers 
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Figure  21.  Multiplication  using  a  partially  inexact  8-bit  shift-and-add  multiplier  with 
Nexact  =  3  and  Ninexact  =  5.  luexact  AND  and  inexact  addition  are  shown  in  red,  and 
exact  AND  and  exact  addition  are  shown  in  green.  In  this  example,  the  leftmost  two 
bits  (sum  and  carry-out)  of  Pi  are  the  output  of  a  single  inexact  one-bit  adder.  Likewise, 
the  leftmost  two  bits  of  P2  and  P3  are  each  output  from  a  single  exact  one-bit  adder. 
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is  compared.  In  this  research,  we  let  Nmuit,ex  <  where  Nmuit,ex  is  the  number 
of  uppermost  bits  in  the  hnal  product  P  which  are  to  be  computed  exactly.  Given 
that  constraint,  from  the  lower  left  portion  of  Fig.  21  we  can  see  that  the  number  of 
error-free  AND  gates  is 


^mult,ex  1 

P^mult,and,ex  ^  ^  ^ 

k=l 


{N, 


mult, ex 


l)iV, 


mult, ex 


and  the  number  of  exact  one-bit  adders  is  also 


^ mult  ,ex  1 

mult, add, ex  ^  ^  k 

k=l 


{P^mult  ,ex  ,ex 


=  N, 


mult, and, ex  • 


(98) 


(99) 

(100) 


The  number  of  inexact  AND  gates  is 

^mult,and^in  Pmult,and,exy 

and  the  number  of  inexact  one-bit  adders  is 

P mult, add, in  —  Pa{Pb  1)  Pmult,add,ex  ■ 

Table  7  shows  the  values  of  Pmuit,and,ex  ;  ^ mult, and, in')  ^ mult, add, ex  •)  and  Pmult,add,in 
computed  using  Equations  (98)-(102),  for  16-bit  adders  with  values  of  Pmuit,ex  ranging 
from  0  to  6  bits.  For  example,  if  the  final  output  product  P  is  to  have  its  three 
most  signihcant  bits  computed  exactly,  i.e.  Pmuit,ex  =  3,  then  among  the  internal 
components  of  the  multiplier,  the  number  of  exact  AND  gates  Pmuit,and,ex  =  3,  the 
number  of  inexact  AND  gates  Pmuit,and,in  =  61,  the  number  of  exact  one-bit  adders 
Pmuit,add,ex  =  3,  and  the  number  of  inexact  one-bit  adders  Pmuit,add,in  =  53.  The  table 
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shows  how  the  total  number  of  exact  ANDs  and  exact  additions  increase  as  Njnuit,ex 
increases,  and  the  total  number  of  inexact  ANDs  and  inexact  additions  decrease  as 
N mult, ex  increases;  that  is,  as  Nmuit,ex  increases  from  0  to  6,  Nmuit,and,ex  and  Nmuit,add,ex 
each  increase  from  0  to  15,  while  Nmuit,and,in  decreases  from  64  to  49  and  Nmuit,add,ex 
decreases  from  56  to  41.  In  this  example,  where  Na  =  8  and  Nb  =  8,  the  table  shows 
that  in  every  row,  Nmuit,and,ex  +  Nmuit,and,in  =  64  (from  Eq.  (101))  and  N mult, add, ex  + 
Nmuit,add,in  =  56  (from  Eq.  (102)).  Table  7  shows  that,  based  on  the  model  dehned 
in  Equations  (98)-(102),  a  multiplier  cannot  have  a  parameter  Nmuit,ex  =  1,  since  the 
two  upper  bits  of  the  hnal  product  are  the  carry-out  and  sum  bits  of  a  single  one-bit 
adder,  and  we  have  chosen  to  dehne  a  one-bit  adder  as  containing  either  all  exact 
components  or  all  inexact  components. 

Table  7.  16-Bit  Shift-and-Add  Multipliers  Using  Exact  &:  Inexact  Bits 

Na  =  8,  Nb  =  8 


N mult, ex 

mult, and, ex 

mult,and,in 

^ mult,add,ex 

N mult, add. 

0 

0 

64 

0 

56 

1 

0 

64 

0 

56 

2 

1 

63 

1 

55 

3 

3 

61 

3 

53 

4 

6 

58 

6 

50 

5 

10 

54 

10 

46 

6 

15 

49 

15 

41 

Assuming  that  the  energy  consumption  due  to  inexact  AND  is  proportional  to  the 
number  of  bits  computed  that  way,  then  the  energy  consumption  of  AND  gates  is 

^mult,and,in  ^ mult, and, inland, ini  (103) 

where  Eand,in  is  the  per-cycle  energy  consumption  of  a  single  AND  gate.  Assuming 
that  the  energy  consumption  due  to  inexact  addition  is  proportional  to  the  number 
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of  bits  so  computed, 


(104) 


where  Eihit-add,in  is  the  per-cycle  energy  consumption  of  a  one-bit  adder.  The  energy 
consumption  of  the  partially  inexact  multiplier  relative  to  the  exact  multiplier  is 


(105) 


From  Eq.  (101)-(102)  we  can  assume  Nmuit,and,in  >  Nmuit,add,in-  Since  a  one-bit  adder 
contains  two  XOR  gates,  two  AND  gates,  and  one  OR  gate  (as  shown  in  Fig.  6),  we 
can  assume  based  on  this  component  count  that  the  energy  consumption  of  a  one-bit 
adder  is  much  greater  than  that  of  a  single  AND  gate,  that  is,  Eihit-add,in  ^  Eand,ini 
and  Eibit-add,ex  S>  Eand,ex-  Usiug  these  assumptions,  Eq.  (106)  simplihes  to 


(107) 


4.5  Distribution  Fitting 

Given  n  observations  of  the  normalized  error  £,  it  is  desirable  to  characterize  the 


sample  in  terms  of  a  probability  distribution.  A  Gaussian  £t  can  be  obtained  by 
calculating  the  sample  mean  and  sample  standard  deviation.  A  Laplacian  £t  can  be 


obtained  using  (54).  For  normal  product  distributions  with  small  values  of  ip,  the 
PDF  (58)  and  the  data  sample  can  be  applied  to  (60)  to  determine  the  value  of  a 
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Figure  3.  Original  uncompressed  image  of  an  F-16,  file  name  4. 2. 05. tiff,  from  the  USC 
SIPI  image  database  [25].  (repeated  from  page  15) 

which  maximizes  the  log-likelihood,  where  a  is  an  estimate  of  the  scale  parameter  a. 

For  higher-order  normal  product  distributions,  (58)  becomes  intractable.  However, 
it  is  possible  to  simulate  normally-distributed  variables  Xi,  X2, . . .  using  a  random 
number  generator,  and  multiply  them  together  in  accordance  with  (55)  to  obtain  a 
random  sample  of  numbers  from  the  desired  population.  From  a  random  sample,  a 
histogram  can  be  generated,  and  from  this  an  empirical  PDF  can  be  inferred.  The 
empirical  PDF  can  then  be  used  with  (60)  to  hud  the  estimated  distribution  parameter 
a.  This  was  accomplished  for  NP  distributions  with  4  <  '0  <  40  using  a  sample  size 
of  10®. 

4.6  Optimizing  the  JPEG  Algorithm  for  Inexact  Computing 

The  preceding  sections  have  described  our  methodology  for  simulating  inexact 
adders  and  multipliers.  This  section  describes  our  methodology  for  limiting  the  pre¬ 
cision  of  the  computations  within  the  JPEG  compression  algorithm,  consistent  with 
Section  3. 1.1.1,  and  also  our  choice  to  use  exact  computation  of  on  the  most  signihcant 


76 


bits  of  each  addition  and  multiplication  operation.  The  image  compression  analysis 
in  this  dissertation  was  performed  using  the  data  set  shown  in  Fig.  3,  from  the  Uni¬ 


versity  of  Southern  California  (USC)  Signal  and  Image  Processing  Institute  (SIPI) 


image  database  [25].  The  still  images  in  the  SIPI  database  are  all  uncompressed 
tagged  image  hie  format  (TIFF)  hies. 

4.6.1  Limited  Precision. 

The  inexact  computing  decision  howchart  in  Fig.  1  indicates  that  for  purposes 
of  saving  energy,  delay,  and  area,  we  should  not  use  any  more  precision  than  is 
necessary  to  store  data.  The  JPEG  compression  algorithm  is  described  in  Section  3.6 
and  illustrated  in  Fig.  2.  In  (72)  we  have  two  8x8  matrix  multiplications:  U  times 
y,  and  then  \UY]  times  U'^ .  Y  is  an  8-bit  signed  integer  ranging  from  —128  to  127. 
The  values  in  U  range  from  —0.49  to  -1-0.49.  Although  these  are  fractional  numbers, 
we  can  use  integer  multiplication  to  perform  the  DCT:  we  can  multiply  each  element 
of  U  by  2^^’^ ,  where  ANu  is  a  positive  integer,  and  for  negative  elements  in  U  we 
use  the  two’s  complement  representation.  This  method  is  valid  as  long  as  we  divide 
by  2^^^  after  the  DCT  is  complete.  In  this  work,  we  let  ANu  =  7,  so  f/  becomes  an 
8-bit  signed  integer  (that  is,  one  sign  bit  followed  by  a  7-bit  signihcand).  Multiplying 
one  element  of  U  by  one  element  of  Y  results  in  a  15-bit  signed  product  (one  sign  bit 
followed  by  a  7-|-7=14-bit  signihcand).  However,  an  additional  three  bits  are  needed 
to  accomplish  an  8  x  8  matrix  multiplication.  The  (r,  c)th  element  of  [UY],  denoted 
[UY]r,c,  is  computed  by 


8 


(108) 


k=l 


YUrjYi^c  +  Ur, 8^8, c- 


(109) 
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To  understand  the  need  for  three  additional  bits,  consider  each  pair  of  terms  within 
Eq.  (109),  where  +  f^r,2h2,c)  is  the  first  pair,  +  Ur^iY^^c)  is  the  second 

pair  etc.  Each  term  consists  of  a  15-bit  signed  product.  Adding  a  pair  of  them 
together  (with  carry-out)  produces  a  16-bit  signed  sum,  which  can  be  up  to  twice 
as  large  as  a  single  term.  If  we  group  the  terms  into  quadruples  (or  pairs  of  pairs), 
the  sum  can  be  up  to  twice  as  large  as  a  pair,  requiring  another  bit.  If  we  view  the 
eight  terms  of  Eq.  (109)  as  a  pair  of  quadruples,  the  sum  can  be  up  to  twice  as 
large  as  a  quadruple,  which  requires  another  bit.  So  in  our  example,  to  perform  an 
8x8  matrix  multiplication  it  requires  four  15-bit  additions,  two  16-bit  additions,  and 
one  17-bit  addition,  producing  an  A^^/y-bit  signed  product,  where  Nuy  =  18.  When 
performing  the  DCT,  we  then  multiply  [UY]  by  f/^.  This  adds  another  7  bits  due  to 
the  multiplication  and  another  3  bits  due  to  the  addition,  for  a  28-bit  signed  product. 
To  properly  scale  the  final  DCT,  we  shift  it  14  places  to  the  right.  This  leaves  an 
8x8  matrix  of  A^jyyf/r-bit  signed  integers,  each  of  which  can  store  a  range  of  possible 
DCT  values  from  —8192  to  -1-8191,  where  Nuyut  =  14. 

However,  we  do  not  need  that  much  precision  and,  as  explained  in  Section  3. 1.1.1, 
precision  costs  energy,  delay,  and  area.  Recall  that  the  original  image  data  contains 
only  8  bits  of  information  per  pixel.  Without  much  loss  of  fidelity,  after  computing 
the  intermediate  product  [f7E],  we  can  truncate  the  lower  8  bits,  leaving  only  a  10-bit 
signed  representation  of  [DR].  If  we  drop  8  bits  from  the  intermediate  product,  then 
we  only  drop  6  bits  from  the  final  product. 

Furthermore,  a  DCT  range  from  —8192  to  -1-8191  is  not  representative  of  realistic 
image  data.  JPEG  is  designed  for  photographs  of  the  natural  environment,  and  that 
kind  of  data  usually  varies  slowly  throughout  space,  resulting  in  smaller  DCT  values, 
especially  in  the  lower  right  corner  of  the  DCT  matrix.  Analysis  of  the  “Mandrill” 
(4. 2. 03. tiff),  “Lena”  (4. 2. 04. tiff),  “F-16”  (4. 2. 05. tiff),  and  “San  Francisco”  (2. 2. 15. tiff) 
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images  from  the  SIPI  database  [25]  reveals  that  for  each  element  in  [171^],  fewer  than 
10  bits  are  needed  to  represent  the  data,  and  for  the  hnal  DCT  [UYU'^]^  fewer  than 
14  bits  are  needed.  This  is  especially  true  for  the  lower  right  corner  of  the  matrix, 
representing  high-frequency  components  which  are  usually  small.  Different  bit  widths 
are  appropriate  for  different  positions  within  the  DCT  matrix.  In  this  research,  the 
following  bit  widths  were  used: 
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These  numbers  are  data- dependent.  We  assume  our  DCT  data  can  £t  into  the  bit 
widths  in  Eq.  (llO)-(lll).  If  any  DCT  data  overflow  beyond  these  bit  widths,  errors 
will  occur  in  the  affected  8x8  blocks.  In  that  case,  it  would  be  necessary  to  modify 
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Eq.  (llO)-(lll)  to  accommodate  the  necessary  range  of  DCT  values.  However,  we  do 
not  want  to  use  larger  bit  widths  than  necessary,  because  that  consumes  more  energy, 
and  when  using  inexact  components,  it  risks  introducing  more  errors  into  the  DCT 
computations. 

4.6.2  Exact  Computation  of  the  Most  Significant  Bits. 

It  is  our  experience  that  inexact  addition  and  multiplication  of  the  most  signihcant 
bit  produce  unacceptably  large  and  frequent  errors  in  the  CST  and  DCT  stages  of  the 
JPEG  algorithm.  In  the  CST,  “unacceptable”  errors  result  in  pictures  with  a  lot  of 
“bad”  pixels  which  are  either  extremely  dark  or  extremely  bright.  In  the  DCT,  errors 
manifest  themselves  in  8  x  8  blocks,  and  pictures  with  “unacceptable”  errors  have 
numerous  and  intense  artifacts  as  shown  in  Fig.  15.  To  limit  the  errors,  we  use  exact 
computation  on  the  three  most  signihcant  bits  of  every  addition  and  multiplication 
operation,  as  explained  in  Sections  4.3.1  and  4.4.1.  We  also  use  exact  computation 
when  computing  two’s  complement,  and  when  computing  the  sign  of  a  multiplication 
operation. 

4.7  JPEG  Compression  Performance 

In  this  research,  we  divide  the  JPEG  compression  algorithm  into  its  sub-algorithms, 
as  shown  in  Fig.  2,  and  then  examine  each  one  to  determine  which  sub- algorithms 
are  candidates  for  inexact  design,  as  shown  in  the  howchart  in  Fig.  1.  The  color  space 
transformation,  discrete  cosine  transformation,  and  quantization  sub-algorithms  are 
computational  in  nature,  and  are  therefore  candidates  for  inexact  design.  Tiling  and 
zigzagging  are  routing  processes,  and  run-amplitude  encoding  and  Huffman  encoding 
are  encoding  processes;  these  are  not  candidates  for  inexact  design.  The  discrete  co¬ 
sine  transform  is  the  most  computationally  intensive,  and  would  most  likely  require 
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the  largest  chip  area  and  energy  consnmption  of  all  the  snb-algorithms.  The  choices 
of  RMS  error,  signal-to-noise  ratio,  and  compression  ratio  are  examples  of  choosing 
error  metrics  as  shown  in  the  flowchart. 

This  research  views  JPEG  compression  performance  in  terms  of  image  distortion 
and  compression  ratio.  Image  distortion  is  a  subjective  concept  which  depends  on 
the  opinion  of  a  human  observer.  Without  resorting  to  psychovisual  experiments,  the 
most  commonly  used  measures  of  distortion  are  the  Mean  Square  Error  (MSE)  and 
SNR  of  the  JPEG  image  which  has  been  encoded  and  decoded,  as  compared  to  the 
original  source  image  [59].  Mean  Square  Error  is  computed  as 
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where  Yij  is  the  original  (*,  j)th  pixel  value,  is  the  decoded  pixel  value  from  the 
JPEG  file,  Npix^x  is  the  number  of  pixel  columns,  and  Npix^y  is  the  number  of  pixel 
rows.  The  RMS  error  is  the  square  root  of  the  MSE,  and  the  SNR  is 
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In  (113),  the  difference  between  the  maximum  and  minimum  pixel  values  is  usually 


close  to  255.  Since  the  pixel  intensity  has  a  range  of  256  possible  values,  the  relative 


RMS  error  is  simply  the  RMS  error  normalized  to  256: 


Relative  RMS  Error 


RMS  Error 
256 


(114) 


Gompression  ratio  is  the  ratio  between  the  original  file  size  (in  bits)  to  the  size  of 
the  encoded  hie  (in  bits).  For  a  grayscale  image,  each  pixel  of  the  original  image  is 
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represented  by  8  bits,  and  the  compression  ratio  R  is 


R 


^Npix,xNpix,y  [bits] 
compressed  file  size  [bits] 


[bits  per  bit] . 


For  a  three-component  color  image,  the  compression  ratio  is 


(115) 
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This  metric  can  also  be  expressed  in  terms  of  bits  per  pixel;  that  is,  8/R  bits  per  pixel 
for  a  grayscale  image,  and  24/i?  bits  per  pixel  for  a  color  image.  The  compression 
ratio  is  heavily  dependent  on  the  amount  of  quantization  performed,  which  in  turn 
depends  on  the  quality  factor  a  described  in  Section  3.6.4.  Given  a  quality  factor 
a  =  0.25,  a  typical  compression  ratio  R  =  22  with  RMS  error  equal  to  4.8  [46]. 


4.8  Matlab  Scripts 

Monte  Carlo  simulations  of  adders  and  multipliers  using  PBL-based  error  models 
can  run  very  fast  in  Matlab,  enabling  data  collection  with  large  sample  sizes.  For 
example,  using  a  desktop  computer,  Matlab  can  simulate  a  16-bit  Kogge-Stone  adder 
10®  times  within  a  few  seconds.  More  realistic  simulations  of  adders  and  multipliers 
can  be  performed  using  Spectre™  as  described  in  Section  4.1;  however,  the  Matlab 
PEL  simulations  can  process  much  more  data  within  a  short  period  of  time. 

This  section  presents  the  function  calls  to  the  Matlab-based  circuit  simulations. 
These  functions  were  designed  to  run  using  Matlab  version  2015a.  They  correspond  to 
the  results  shown  in  Chapter  V.  The  complete  functions  are  given  in  the  appendices. 
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4.8.1  Ripple-Carry  Adders. 


The  function  call  to  the  inexact  ripple-carry  adder  simulation,  described  in  Section 
3.2.5,  is 

[S ,  Gout,  SO]  =  Adder _RCA_ inexact (n ,  A,  B,  Gin,  p) 

where  the  input  n  is  the  bit  length  of  the  inputs  A  and  B,  Gin  is  the  carry-in  bit, 
and  p  is  the  probability  of  correctness  of  each  binary  computation  within  the  adder; 
the  output  S  is  the  sum  S,  Gout  is  the  carry-out  bit,  and  SO  is  the  augmented  sum 
5'+  defined  in  Eq.  (25).  The  code  for  this  function  is  given  in  Appendix  A.  1.1  on 
page  125.  The  results  of  this  simulation  are  presented  in  Section  5.1.1. 

4.8.2  Kogge-Stone  Adders. 

The  function  call  to  the  inexact  Kogge-Stone  adder  simulation,  described  in  Sec¬ 
tion  3.2.7,  is 

[S ,  Gout,  SO]  =  kogge_stone_inexact_PBL (n ,  A,  B,  Gin,  p) 

where  the  input  and  output  parameters  are  the  same  as  in  Section  4.8.1.  The  code  for 
this  function  is  given  in  Appendix  A. 1.2  on  page  131.  The  results  of  this  simulation 
are  presented  in  Section  5.1.1. 

4.8.3  Ling  Carry-Lookahead  Adders. 

The  function  call  to  the  inexact  Ling  radix-4  carry-lookahead  adder  simulation, 
described  in  Section  3.2.6  is 

[S ,  Gout,  SO]  =  ling_adder_inexact_PBL (n ,  A,  B,  Gin,  p) 

where  the  input  and  output  parameters  are  the  same  as  in  Section  4.8.1.  The  code  for 
this  function  is  given  in  Appendix  A.  1.3  on  page  134.  The  results  of  this  simulation 
are  presented  in  Section  5.1.1. 
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4.8.4  Shift-and-Add  Multipliers. 


The  function  call  to  the  inexact  shift-and-add  multiplier  simulation,  described  in 
Section  3.3,  is 

P  =  Multiplier_basic_inexact (A ,  B,  na ,  nb ,  p) 

where  na  and  nb  are  the  bit  lengths  of  the  inputs  A  and  B,  and  p  is  the  probability 
of  correctness  of  each  binary  operation  within  the  multiplier;  the  output  P  is  the 
product.  The  code  for  this  function  is  given  in  Appendix  C.3.1  on  page  166.  The 
results  of  this  simulation  are  presented  in  Section  5.2. 

4.8.5  Wallace  Tree  Multipliers. 

The  function  call  to  the  inexact  Wallace  tree  multiplier  simulation,  described  in 
Section  3.3,  is 

P  =  multiplier_wallace_tree_inexact_PBL (A ,  B,  p) 

where  A  and  B  are  the  inputs  to  the  multiplier  and  p  is  the  probability  of  correctness 
of  each  binary  operation  within  the  multiplier;  the  output  P  is  the  product.  The 
code  for  this  function  is  given  in  Appendix  C.3.2  on  page  170.  The  results  of  this 
simulation  are  presented  in  Section  5.2. 

4.8.6  Floating-Point  Adders. 

The  function  call  to  the  inexact  floating-point  adder  simulation,  described  in  Sec¬ 
tion  3.5.1,  is 

[Ss,Es,Ms]  =  Adder_f loating_inexact (Sa,Ea,Ma,Sb,Eb,Mb,fmt ,p) 

where  the  inputs  Sa,  Ea,  and  Ma  are  the  sign,  exponent,  and  mantissa  of  the  first 
addend  A  as  shown  in  Eq.  (64);  Sb,  Eb,  and  Mb  are  the  sign,  exponent,  and  mantissa  of 
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the  second  addend  5;  fmt  is  a  string  equal  to  ’BINARY16’,  'BINARY32%  'BINARY64% 
or  ’BINARY128'  specifying  the  precision  of  the  adder;  and  p  is  the  probability  of 
correctness  of  each  binary  operation  within  the  adder.  The  outputs  Ss,  Es,  and  Ms 
are  the  sign,  exponent,  and  mantissa  of  the  sum  S.  The  code  for  this  function  is  given 
in  Appendix  B  on  page  151.  The  results  of  this  simulation  are  presented  in  Section 
5.1.3. 

4.8.7  Floating-Point  Multipliers. 

The  function  call  to  the  inexact  floating-point  adder  simulation,  described  in  Sec¬ 
tion  3.5.2,  is 

[Sp,Ep,Mp]  =  Multiplier_f loating_inexact (Sa , Ea , Ma , Sb , Eb , Mb 
, fmt , p) 

where  the  inputs  Sa,  Ea,  and  Ma  are  the  sign,  exponent,  and  mantissa  of  the  first  multi¬ 
plicand  A  as  shown  in  Eq.  (64);  Sb,  Eb,  and  Mb  are  the  sign,  exponent,  and  mantissa 
of  the  second  multiplicand  B;  fmt  is  a  string  equal  to  ’BINARY16’,  'BINARY32T 
’BINARY64’,  or  ’BINARY128'  specifying  the  precision  of  the  multiplier;  and  p  is  the 
probability  of  correctness  of  each  binary  operation  within  the  multiplier.  The  outputs 
Ss,  Es,  and  Ms  are  the  sign,  exponent,  and  mantissa  of  the  product  P.  The  code  for 
this  function  is  given  in  Appendix  D  on  page  178.  The  results  of  this  simulation  are 
presented  in  Section  5.3. 

4.8.8  Matrix  Multiplier. 

Matrix  multiplication  is  at  the  core  of  the  discrete  cosine  transform,  given  in  Eq. 
(72)  and  illustrated  in  Fig.  2.  The  function  call  to  the  inexact  matrix  multiplier 
simulation  is 
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[C ,  nc]  =  mtimes_inexact_PBL (A ,  B,  na ,  nb ,  p,  bit) 

where  the  inputs  A  and  B  are  matrices,  na  and  nb  are  the  bit  lengths  of  each  element 
in  A  and  B,  and  p  is  the  probability  of  correctness  of  each  binary  operation  within 
the  multiplier.  The  input  bit  is  the  highest-order  bit  which  can  be  inexact;  all  the 
lower-order  bits  up  to  and  including  bit  are  computed  inexactly,  while  the  higher- 
order  bits  are  computed  exactly.  The  output  C  is  the  matrix  product,  and  nc  is  the 
bit  length  of  each  element  in  C.  The  code  for  this  function  is  given  in  Appendix  E  on 
page  182. 

4.8.9  Discrete  Cosine  Transform. 

The  discrete  cosine  transform  is  a  key  part  of  the  JPEG  compression  algorithm, 
as  shown  in  Fig.  2.  The  function  call  to  the  inexact  discrete  cosine  transform  is 

B  =  DCT_inexact_PBL (A ,  nbits ,  p) 

where  the  input  A  is  the  8x8  matrix  of  image  data,  nbits  is  the  number  of  bits  of 
precision  allocated  to  Ur^c^Uc,r  where  U  is  the  DCT  matrix,  and  p  is  the  probability  of 
correctness  of  each  binary  operation  performed  during  the  DCT.  For  this  dissertation, 
nbits  is  always  22.  The  output  B  is  the  discrete  cosine  transform  of  the  input  A.  The 
code  for  this  function  is  given  in  Appendix  F.6.4.2  on  page  193.  The  JPEG  simulation 
results  are  presented  in  Section  5.4.2. 

4.8.10  JPEG  Compression  Algorithm. 

The  inexact  JPEG  compression  algorithm,  shown  in  Fig.  2,  is  performed  by  the 
main  program  given  in  Appendix  F.6.1  on  page  185.  This  program  uses  the  inexact 
DCT  described  in  Section  4.8.9.  The  JPEG  simulation  results  are  presented  in  Section 


5.4. 


V.  Results 


We  now  present  the  results  of  our  inexact  adder,  multiplier,  and  JPEG  simula¬ 
tions  described  in  Chapter  IV.  These  simulations  were  performed  using  Matlab  using 
a  probabilistic  Boolean  logic  (PBL)  error  model.  These  results  illustrate  how  the 
likelihood  and  distribution  of  errors  in  the  simulations  vary  with  the  probability  of 
correctness  p.  This  chapter  also  includes  the  results  of  the  Spectre"*"^  analog  simula¬ 
tions  which  show  how  output  error,  energy  consumption,  and  energy-delay  product 
(EDP)  vary  with  p. 

5.1  Inexact  Adders 

In  this  section,  we  present  the  results  of  the  Matlab  PBL  simulations  of  8,  16,  and 
32-bit  ripple-carry,  Kogge-Stone,  and  Ling  carry  lookahead  adders.  These  results  show 
how  the  distribution  of  output  errors  varies  with  p.  Also  in  this  section  we  present 
the  results  of  the  Spectre™  simulations  of  8  and  16-bit  ripple-carry  and  Kogge-Stone 
adders  in  14  nm  FinFET  CMOS  technology,  and  8-bit  Kogge-Stone  adders  in  0.6  pm 
CMOS  technology.  These  results  show  how  the  probability  of  correctness,  output 
error,  energy,  and  EDP  vary  with  power  supply  voltage  and  noise  power  spectral 
density. 

5.1.1  Inexact  Adders  with  PBL. 

An  8-bit  noisy  Kogge-Stone  adder  was  simulated  using  probabilistic  Boolean  logic, 
as  described  in  Section  4.2,  for  various  values  of  p.  The  inputs  A  and  B  were  randomly 
drawn  from  a  uniform  distribution  between  0  and  2^  —  1;  in  each  simulation,  the 
sample  size  was  10®.  For  each  simulation,  the  noisy  augmented  sum  S~^  (dehned  in 
(24)-(25))  and  the  normalized  error  i  (defined  in  (86)-(88))  were  observed.  Error 
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histograms  for  p  =  0.90,  0.95,  and  0.99  are  shown  in  Fig.  23.  The  figure  shows 
that  as  p  increases,  d  is  more  tightly  dispersed  around  0.  Each  histogram  has  a 
primary  mode  at  d  =  0  and  multiple  other  modes  at  powers  of  The  modes  are  at 
powers  of  |  because  the  most  frequent  errors  involve  only  a  single  binary  digit  within 
the  adder.  For  p  >  0.99  the  distribution  of  i.  is  highly  kurtotic — so  much  so  that 
a  semilogarithmic  scale  is  necessary  to  display  the  tails  of  the  distribution.  In  this 
case,  a  high-order  normal  product  distribution  is  the  best  ht  to  the  sample,  which  is 
characterized  by  an  extremely  tall,  narrow  peak  at  e  =  0,  and  extremely  infrequent, 
but  widely  dispersed,  nonzero  values  of  e. 

Error  statistics  for  various  values  of  p  are  summarized  in  Table  8.  The  table  shows 
that  as  p  approaches  1,  the  error  standard  deviation  approaches  0  and  the  kurtosis 
increases.  An  error  spectrum  plot  for  p  =  0.90  is  shown  in  Fig  24.  This  plot  shows 
the  sample  mean  of  100  observations  of  e  at  every  possible  value  of  A  and  B.  The 
expected  error  appears  roughly  to  be  a  linear  function  of  A  +  B,  with  the  smallest 
error  magnitude  along  the  line  B  =  255  —  A.  Closer  inspection  reveals  discontinuities 
along  the  lines  A  =  64,  A  =  128,  A  =  192,  B  =  64,  B  =  128,  and  B  =  192. 

5.1.2  Probability  Distributions. 

For  each  2,  4,  and  8-bit  inexact  adder,  a  maximum  likelihood  estimation  was  per¬ 
formed  on  the  distribution  of  the  normalized  error  e.  MLE  was  also  performed  on 
a  6-bit  ripple-carry  adder,  and  a  16-bit  Ling  carry  lookahead  adder.  Three  different 
types  of  distributions  were  considered:  Gaussian,  Laplacian,  and  normal  product  dis¬ 
tributions  with  order  2  <  'ijj  <  40.  The  results  for  the  ripple-carry  adder  are  shown  in 
Fig.  26.  The  Gaussian  distribution  seemed  to  be  the  best  ht  for  values  of  p  <  0.85, 
and  for  higher  values  of  p,  a  normal  product  distribution  was  usually  the  best  ht.  The 
exception  was  the  2-bit  ripple-carry  adder,  for  which  the  Laplacian  distribution  was 
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(c)  p  =  0.99.  NP20  fit  with  ct  =  0.13. 


Figure  23.  Error  histograms  showing  the  probability  mass  function  (PMF)  of  e  for  an 
inexact  8-bit  Kogge-Stone  adder,  with  inputs  A  and  B  uniformly  distributed  from  0  to 
2^  —  1,  for  various  values  of  p.  Results  are  from  Matlab  PBL  simulation.  The  red  line 
in  (a)  is  a  Gaussian  fit  to  the  PMF,  and  in  (b)-(c)  the  red  lines  are  normal  product 
(NP)  curve  fits  based  on  maximum  likelihood  estimation.  The  slight  asymmetry  in 
the  figures  is  due  to  the  fact  that  the  data  are  from  Monte  Carlo  simulations  based  on 
random  number  generation. 


Table  8.  Error  Statistics:  8-Bit  Kogge-Stone  Adder  with  PBL 
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P(e  =  0) 
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0.0056 

0.8500 

0.0109 

0.9000 

0.0319 

0.9500 

0.1473 

0.9600 
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0.9700 
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0.9800 

0.4484 

0.9900 

0.6668 

0.9950 

0.8150 
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0.8843 

0.9980 
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0.9999 

0.9959 

Mean 

Std  Dev 
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0.3125 
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Figure  24.  Error  spectrum  for  an  inexact  8-bit  Kogge-Stone  adder,  for  all  possible 
values  of  the  inputs  A  and  B,  and  probability  p  =  0.90  of  correctness  at  each  node.  Each 
point  on  the  plot  is  the  sample  mean  of  100  observations  of  e. 


90 


-1 


-0.5 


0.5 


1 


(a)  Kogge-Stone  adder.  Histogram  peak  value 
is  0.0099. 
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(b)  Ripple-Carry  adder.  Histogram  peak  value 
is  0.0457. 
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(c)  Ling  adder.  Histogram  peak  value  is 
0.0810. 


Figure  25.  Error  histograms  for  various  inexact  32-bit  adders,  with  p  =  0.90,  and  with 
inputs  A  and  B  uniformly  distributed  from  0  to  2^  —  1.  Results  are  from  Matlab  PBL 
simulation.  The  modes  are  at  powers  of  ^  because  the  most  frequent  errors  involve 
only  a  single  binary  digit  within  the  adder. 
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usually  the  most  likely  fit.  From  Fig.  26a  it  is  apparent  that  "0  decreases  monotoni- 
cally  with  (1  —p),  for  smaller  values  of  p.  The  other  adder  architectures  displayed  this 
relationship  as  well.  In  each  case,  seems  to  increase  as  p  increases,  until  'ijj  reaches 
40,  after  which  its  behavior  becomes  erratic.  This  erratic  behavior  may  be  due  to  the 
fact  that  we  only  attempted  to  £t  NP  distributions  with  'ip  <  4Q,  and  may  indicate 
that  the  fits  obtained  were  suboptimal.  The  monotonic  relationship  is  indicated  by 
the  bold  lines  connecting  the  points  in  Fig.  26a,  and  also  the  associated  values  of  a 
in  Fig.  26b. 

Fig.  26c  shows  the  estimated  order  ip  as  a.  function  of  the  adder  bit  width  N . 
Looking  again  at  only  the  non-erratic  points  mentioned  above.  Fig.  26c  indicates 
that  for  iV  <  8,  '0  is  roughly  a  linear  function  of  iV,  with  slope  that  increases  with  p. 

Fig.  26d  illustrates  the  sample  kurtosis  of  e  as  a  function  of  (1  —p).  It  is  apparent 
that  the  kurtosis  is  almost  independent  of  N .  Also,  the  log  of  the  kurtosis  is  nearly  a 
linear  function  of  log(l  —p).  As  p  increases,  so  does  kurtosis,  which  is  also  associated 
with  an  increase  in  the  order  ip,  as  we  would  expect  in  accordance  with  Fig.  26a. 

5.1.3  Comparisons  Among  Adder  Types. 

Figs.  25  and  27  compare  the  performances  of  the  ripple-carry,  Kogge-Stone  and 
Ling  carry  lookahead  architectures.  Fig.  25  shows  histograms  for  each  type  of  32-bit 
adder  for  a  value  of  p  =  0.90.  It  is  apparent  that  each  histogram  has  a  primary 
mode  at  £  =  0  and  other  modes  at  powers  of  Fig.  27  provides  summary  statistics 
for  8,  16,  and  32-bit  ripple-carry,  Kogge-Stone,  and  Ling  adders  for  various  values 
of  p  between  0.8000  and  0.9999.  Fig.  27a  shows  that  the  probability  of  zero  error 
(d  =  0)  increases  with  p  and  decreases  with  N.  Fig.  27b  shows  that  the  standard 
deviation  of  e  decreases  with  p,  and  is  smaller  for  the  ripple-carry  and  Ling  adders 
than  for  the  Kogge-Stone  adder.  Interestingly,  the  32-bit  version  of  the  Ling  adder 
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(a)  Normal  product  order  ‘ip  as  a  function  of  (b)  Normal  product  scale  parameter  d  as  a 
(1  —  p).  function  of  (1  —  p). 
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(c)  Normal  product  order  tp  as,  a  function  of  N . 


(d)  Sample  kurtosis  as  a  function  of  (1  —  p). 


Figure  26.  Error  statistics  for  inexact  TV-bit  ripple-carry  adders  with  various  values  of 
TV  and  p,  and  with  inputs  A  and  B  uniformly  distributed  from  0  to  2^  —  1.  Results  are 
from  Matlab  PBL  simulation.  (a,b,c)  Most  likely  normal  product  distribution  to  fit 
each  sample  of  e.  Due  to  resource  constraints,  distribution  fits  with  tp  >  4Q  were  not 
attempted,  resulting  in  some  suboptimal  values  of  ip;  these  are  the  points  not  joined 
by  lines,  (d)  Sample  kurtosis  of  e. 
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has  a  smaller  dispersion  of  e  than  the  8  and  16-bit  versions.  Fig.  27c  shows  very  very 
little  skewness,  except  when  p  is  large.  Fig.  27  shows  that  the  distribution  of  e  is 
exceedingly  kurtotic  for  large  values  of  p. 


(a)  Probability  that  e  =  0  (b)  Standard  deviation  of  £ 


P  P 


(c)  Skewness  of  e  (d)  Kurtosis  of  e 

Figure  27.  Error  statistics  for  various  inexact  adders,  with  inputs  A  and  B  uniformly 
distributed  from  0  to  2^  —  1.  Results  are  from  Matlab  PBL  simulation. 


We  considered  using  IEEE  754  standard  floating-point  adders  and  multipliers  for 
use  in  the  JPEG  compression  algorithm.  Initial  results  were  not  promising  for  this 
purpose,  due  to  large  errors  occurring  frequently,  and  we  were  able  to  implement  the 
JPEG  algorithm  using  integer  arithmetic.  The  results  are  included  here  for  complete¬ 
ness. 
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Error  histograms  for  various  floating-point  adders  are  shown  in  Fig.  28.  The 


modes  of  the  distribution  are  at  powers  of  2. 


e 

(a)  Half-precision  adder  with  p  =  0.9999. 


Figure  28.  Error  histograms  for  inexact  16 
A  and  B  uniformly  distributed  from  —100 
from  Matlab  PBL  simulation. 


e 

(b)  Single-precision  adder  with  p  =  0.99999. 

d  32-bit  floating-point  adders,  with  inputs 
-1-100,  for  various  values  of  p.  Results  are 


5.1.4  Spectre^'^  Simulation. 

The  Spectre"'"'^  simulation  of  the  noisy  0.6  pm  Kogge-Stone  adder,  described  in 
Section  4.1.1,  produced  the  results  shown  in  Table  9.  These  results  show  that  in¬ 
creased  energy  consumption  is  necessary  to  reduce  error,  and  the  relationship  is  non¬ 
linear.  For  example,  when  the  noise  PSD  is  5  x  10“^°  V^/Hz,  Vdd  =  1-5  V,  Eavg  =  2.32 
pj  per  cycle,  and  the  normalized  error  standard  deviation  is  0.1827;  when  Vdd  =  3.3 
V,  Eavg  increases  to  12.70  pJ,  and  the  error  standard  deviation  decreases  to  0.0584. 
As  expected,  increasing  energy  consumption  also  reduces  delay.  For  example,  Table 
9  shows  that  under  noise-free  conditions,  when  =  1.5  V  the  energy  consumption 
is  1.41  pJ  per  cycle  and  the  delay  is  12.28  ns,  and  when  Vdd  =  3.3  V  the  energy  con¬ 
sumption  increases  to  9.55  pJ  and  the  delay  decreases  to  2.89  ns.  An  error  histogram 
is  shown  in  Fig.  29.  The  errors  in  Table  9  are  smaller  than  the  Matlab  PBL  simula- 
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tion  errors  shown  in  Table  8.  The  Spectre™  simulation  errors  in  Fig.  29  also  appear 
to  be  smaller  than  the  PEL  errors  in  Fig.  23a.  The  reason  the  binary  error  model 
produces  greater  output  errors  than  the  analog  model  is  that  with  the  binary  model, 
the  error  at  each  is  either  0%  or  100%,  whereas  the  analog  model  allows  a  continuum 
of  errors  at  each  node.  With  the  analog  model,  an  error  voltage  may  slightly  exceed 
the  Vdd/2  threshold  for  a  brief  amount  of  time,  but  may  not  have  sufficient  energy 
to  propagate  throughout  the  circuit. 

Plots  of  energy  savings  (i.e.  percent  reduction  in  Eavg)  and  Energy-Delay  Product 
(EDP)  savings  as  functions  of  the  error  e  are  shown  in  Fig.  32-33.  Fig.  32  shows 
that  energy  savings  drop  sharply  as  e  approaches  zero.  Scaling  the  supply  voltage 
increases  delay.  In  spite  of  this  trade-off.  Fig.  33  shows  that  there  is  still  a  beneht  in 
terms  of  EDP. 
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^  0.1 
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-1  -0.5  0  0.5  1 
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Figure  29.  Error  histogram  for  an  8-bit  Kogge-Stone  adder,  with  inputs  A  and  B 
uniformly  distributed  from  0  to  2^  —  1.  Power  supply  Vdd  =  1-5  V;  noise  PSD  is  1  x  10“® 
V^/Hz.  Result  is  from  a  Spectre^^  simulation  of  0.6  /xm  CMOS  technology. 

Fig.  30  shows  one  main  result  of  this  research.  This  hgure  shows  the  energy 
reduction  (percentage)  as  a  function  of  the  percentage  error  rate  (1  —p)  for  two  adder 
architectures.  Results  are  shown  for  six  values  of  the  noise  power  spectral  density 
(a)-(f).  The  hgure  shows  the  energy  reduction  as  a  function  of  error  rate  for  an  8-bit 
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Table  9.  Error  Statistics:  Noisy  8-Bit  Kogge-Stone  Adder 
Spectre^''^  Simulation 

Noise-Free 


VpD  [V]  <5  max  [ns]  Eg^g  [pj] 


1.5 

12.28 

1.41 

2.0 

6.39 

3.49 

2.5 

4.73 

5.73 

3.0 

3.32 

8.27 

3.3 

2.89 

9.55 

Noise  #1: 

5  X  10- 

10  VVHz 

Vdd 

E 

■‘-^avg 

[V] 

[pJ] 

Mean 

Std  Dev 

Skewness 

Knrtosis 

1.5 

2.32 

0.0080 

0.1827 

-0.0183 

6.09 

2.0 

4.42 

-0.0031 

0.1085 

-1.0517 

22.5 

2.5 

7.03 

-0.0011 

0.0536 

-0.4252 

44.4 

3.0 

10.42 

-0.0019 

0.0582 

-0.2743 

48.3 

3.3 

12.70 

-0.0013 

0.0584 

-0.2544 

50.9 

Noise  #2: 

1  X  10" 

-0  VVHz 

Vdd 

E 

■‘-^avg 

[V] 

[pJ] 

Mean 

Std  Dev 

Skewness 

Knrtosis 

1.5 

3.53 

-0.0219 

0.2327 

-0.1222 

3.93 

2.0 

7.09 

-0.0270 

0.1746 

-0.7496 

7.33 

2.5 

11.51 

-0.0151 

0.1319 

-1.0512 

11.1 

3.0 

16.46 

-0.0150 

0.1056 

-1.8386 

17.2 

3.3 

19.46 

-0.0098 

0.0945 

-2.2123 

22.7 
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ripple  carry  adder  and  a  16-bit  ripple  carry  adder,  both  in  14  nni  FinFET  CMOS 
technology.  The  hgure  also  shows  the  energy  reduction  as  a  function  of  error  rate  for 
an  8-bit  Kogge  Stone  adder  in  a  16-bit  Kogge  Stone  adder,  both  in  14  nm  FinFET 
CMOS  technology.  The  hgure  also  shows  the  energy  reduction  for  an  8-bit  Kogge 
Stone  adder  in  a  0.6  pm  CMOS  technology. 
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Figure  30.  Percent  reduction  in  energy  E^vg  per  switching  cycle,  as  a  function  of  1  —  p, 
for  the  noisy  8  and  16-bit  ripple-carry  (RC)  and  Kogge-Stone  (KS)  adders  in  0.6  iim 
CMOS  and  14  nm  FinFET  CMOS  technologies,  l—p  is  the  probability  of  error  at  each 
node  within  the  adder  circuit.  Curves  (a)  through  (f)  represent  the  noise  conditions 
specified  by  their  power  spectral  densities:  (a)  3  x  10“^^  V^/Hz,  (b)  5  x  10“^^  V^/Hz, 
(c)  1  X  10-“  VVHz,  (d)  2  X  10-“  VVHz,  (e)  5  x  10-^°  VVHz,  and  (f)  1  x  10-®  VVHz. 
Results  are  from  Spectre^'^  simulation. 


The  results  for  these  adders  in  Fig.  30  show  that  the  error  reduction  tends  to 
increase  as  the  error  rate  increases,  for  each  technology.  The  results  show  that  the 
error  reduction  takes  on  the  largest  value  of  over  90%  when  the  error  rate  exceeds  0.1. 
For  example,  for  the  case  in  which  the  noise  power  spectral  density  takes  on  a  value 
of  3  X  10  V^/Hz  (see  ‘a’  curves  in  the  hgure),  the  results  in  the  hgure  show  that  an 
energy  reduction  of  approximately  92%  can  be  achieved  with  an  error  rate  of  0.1  for 
the  8-bit  ripple  carry  adder,  16-bit  ripple  carry  adder,  8-bit  KS  adder,  and  16-bit  KS 
adder.  The  results  in  the  hgure  show  that,  at  each  value  of  (1  —p  >  0.005),  the  energy 
reduction  achieved  with  an  8-bit  KS  adder  in  14  nm  FinFET  CMOS  technology  with  a 
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noise  power  spectral  density  of  3  x  10“^^  V^/Hz  is  greater  than  the  energy  reduction 
achieved  with  an  8-bit  KS  adder  in  0.6  pm  CMOS  technology  with  a  noise  power 
spectral  density  of  5  x  10“^^  V^/Hz. 

Fig.  31  shows  the  reduction  in  the  energy-delay  product  per  switching  cycle  as  a 
function  of  1  —  p,  for  8-bit  ripple  carry  adder,  a  16-bit  ripple  carry  adder,  an  8-bit  KS 
adder,  a  16-bit  KS  adder,  and  a  0.6  pm  8-bit  KS  adder.  For  the  lowest  values  of  the 
noise  power  spectral  density,  the  results  in  the  hgure  show  that  the  reduction  in  the 
energy-delay  product  can  take  on  a  maximum  value  for  a  certain  choice  of  (1  —  p). 
For  example  for  the  lowest  value  of  the  noise  power  spectral  density  of  3  x  10^2  vVHz 
(curves  labeled  ‘a’  curves),  the  energy-delay  product  achieves  a  maximum  reduction 
for  the  8-bit  KS  adder  in  14  nm  FinFET  CMOS  technology  (blue  curve  with  triangles) 
when  the  value  of  (1  —  p)  is  approximately  0.05.  The  results  in  the  hgure  show  that 
the  reduction  in  the  energy-delay  product  tends  to  be  smaller  as  the  noise  power 
spectral  density  is  increased. 
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Figure  31.  Percent  reduction  in  energy-delay  product  (EDP)  per  switching  cycle,  as  a 
function  of  1  —  p,  for  the  noisy  8  and  16-bit  ripple-carry  (RC)  and  Kogge-Stone  (KS) 
adders  in  0.6  pm  CMOS  and  14  nm  FinFET  CMOS  technologies.  1—p  is  the  probability 
of  error  at  each  node  within  the  adder  circuit.  Curves  (a)  through  (f)  represent  the 
noise  conditions  specified  by  their  power  spectral  densities:  (a)  3  x  10“^^  V^/Hz,  (b) 
5  x  10-12  V2/Hz,  (c)  1  X  10-11  V2/Hz,  (d)  2  x  lO-n  V^/Hz,  (e)  5  x  10-i°  V^/Hz,  and  (f) 
1  X  10-®  V^/Hz.  Results  are  from  Spectres'll^  simulation. 
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Fig.  32  shows  another  main  result  of  this  dissertation.  This  hgure  shows  the 
energy  reduction  as  a  function  of  the  standard  deviation  of  the  normalized  output 
error  for  an  8-bit  ripple  carry  adder  and  a  16-bit  ripple  carry  adder  in  14  nm  FinFET 
CMOS  technology.  The  hgure  also  shows  the  energy  reduction  as  a  function  of  the 
standard  deviation  of  the  normalized  output  error  for  an  8-bit  KS  adder  and  a  16-bit 
KS  adder  in  14  nm  FinFET  CMOS  technology,  and  for  an  8-bit  KS  adder  in  0.6 
/im  CMOS  technology.  Results  are  shown  for  six  values  of  the  noise  power  spectral 
density,  (a)-(f). 
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Figure  32.  Percent  reduction  in  energy  Eavg  per  switching  cycle,  as  a  function  of  the 
standard  deviation  of  the  normalized  output  error  £,  for  the  noisy  8  and  16-bit  ripple- 
carry  (RC)  and  Kogge-Stone  (KS)  adders  in  0.6  CMOS  and  14  nm  FinFET  CMOS 
technologies.  Curves  (a)  through  (f)  represent  the  noise  conditions  specified  by  their 
power  spectral  densities:  (a)  3  x  10“^^  V^/Hz,  (b)  5  x  10“^^  V^/Hz,  (c)  1  x  10“^^  V^/Hz, 
(d)  2  X  10“^^  V^/Hz,  (e)  5  x  10“^°  V^/Hz,  and  (f)  1  x  10“®  V^/Hz.  Results  are  from 
Spectre^'^  simulation. 


The  results  for  these  adders  in  Fig.  32  show  that  the  error  reduction  in  each  adder 
architecture  tends  to  take  on  the  largest  value,  exceeding  90%,  when  the  standard 
deviation  of  the  error  exceeds  approximately  0.18.  For  example,  for  the  case  in  which 
the  noise  power  spectral  density  takes  on  a  value  of  3  x  10“^^  V^/Hz  (see  ‘a’  curves  in 
the  hgure),  the  results  in  the  hgure  show  that  the  energy  reduction  of  approximately 
95%  can  be  achieved  with  a  standard  deviation  of  the  normalized  output  error  of  0.18 
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for  the  8-bit  ripple  carry  adder,  16-bit  ripple  carry  adder,  8-bit  KS  adder,  and  16-bit 
KS  adder. 

Fig.  33  shows  the  reduction  in  the  energy-delay  product  per  switching  cycle  as  a 
function  of  the  standard  deviation  of  the  normalized  output  error.  The  results  in  this 
figure  show  that  the  8-bit  ripple-carry  adder  and  16-bit  ripple  carry  adder  in  14  nm 
FinFET  CMOS  technology  achieve  the  greatest  reduction  in  the  energy-delay  product 
of  60%  when  the  noise  power  spectral  density  takes  on  the  value  of  2  x  10“^^  V^/Hz, 
as  shown  in  the  ‘d’  curves,  where  the  standard  deviation  of  the  error  is  approximately 
0.32.  The  results  for  the  8-bit  KS  adder  and  16-bit  KS  adder  show  that  as  the  noise 
power  spectral  density  is  increased,  the  energy-delay  product  beneht  is  reduced  from 
50%  to  22%  (see  ‘a’  curves  compared  with  ‘b’  curves). 
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Figure  33.  Percent  reduction  in  energy-delay  product  (EDP)  per  switching  cycle,  as 
a  function  of  the  standard  deviation  of  the  normalized  output  error  e,  for  the  noisy  8 
and  16-bit  ripple-carry  (RC)  and  Kogge-Stone  (KS)  adders  in  0.6  jim  CMOS  and  14 
nm  FinFET  CMOS  technologies.  Curves  (a)  through  (f)  represent  the  noise  conditions 
specified  by  their  power  spectral  densities:  (a)  3  x  10“^^  V^/Hz,  (b)  5  x  10“^^  V^/Hz, 
(c)  1  X  10-“  VVHz,  (d)  2  X  10-“  VVHz,  (e)  5  x  10-^°  VVHz,  and  (f)  1  x  10-®  VVHz. 
Results  are  from  Spectre^'^  simulation. 


Plots  of  energy  consumption  Eavg  and  energy-delay  product  (EDP)  as  functions 
of  the  error  e  are  shown  in  Fig.  32.  This  hgure  shows  that  Eavg  greatly  increases 
as  i  approaches  zero.  Scaling  the  supply  voltage  increases  delay.  In  spite  of  this 
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trade-off,  Fig.  33  shows  that  there  is  still  a  beneht  in  terms  of  EDP,  as  mentioned  in 
the  previous  paragraph. 

A  similar  result  for  the  14  nm  ripple-carry  adders  is  shown  in  Figs.  32-33.  Fig. 
32  conhrms  that,  in  a  noisy  circuit,  energy  savings  can  be  achieved  by  allowing  more 
errors  at  the  output.  Fig.  33  shows  that  EDP  improvements  can  be  made  by  allowing 
more  errors,  up  to  a  point.  However,  if  the  error  standard  deviation  is  allowed  to 
increase  beyond  0.12  (corresponding  to  Vdd  <  0.4  V  in  curve  (a)),  the  increased  delay 
begins  to  dominate  the  EDP.  Therefore,  correctness  can  only  be  traded  for  EDP  up 
to  that  point.  A  designer  who  is  interested  only  in  saving  energy,  and  not  concerned 
about  speed,  would  use  Figs.  30  and  32  to  choose  acceptable  values  of  p,  output 
error,  and  energy  reduction.  However,  a  designer  concerned  with  delay  could  instead 
use  the  EDP  curves  in  Figs.  31  and  33. 

5.2  Shift-and-Add  Multiplier  with  PBL 

A  16-bit  noisy  shift-and-add  multiplier  was  simulated  using  probabilistic  Boolean 
logic,  as  described  in  Section  4.4,  for  various  values  of  p.  The  inputs  A  and  B  were 
randomly  drawn  from  a  uniform  distribution  between  0  and  2^  —  1;  in  each  simulation, 
the  sample  size  was  10^.  For  each  simulation,  the  noisy  product  P  and  the  normalized 
error  d  were  observed.  Error  histograms  for  p  =  0.90,  0.95,  and  0.99  are  shown  in 
Fig.  34.  The  hgure  shows  that  as  p  increases,  e  is  more  tightly  dispersed  around  0. 
Each  histogram  has  a  primary  mode  at  £  =  0  and  multiple  other  modes  at  powers  of 

Error  statistics  for  various  values  of  p  are  summarized  in  Table  10.  Future  work 
will  include  Spectre™  analog  simulations  of  16-bit  shift-and-add  multipliers. 
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(a)  p  =  0.90.  Histogram  peak  value  is  0.035. 
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(b)  p  =  0.95.  Histogram  peak  value  is  0.078. 
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(c)  p  =  0.99.  Histogram  peak  value  is  0.456. 


J 

Figure  34.  Error  histograms  for  an  inexact  16-bit  shift-and-add  multiplier,  with  inputs 
A  and  B  uniformly  distributed  from  0  to  2^  —  1,  for  various  values  of  p.  Results  are 
from  Matlab  PBL  simulation. 
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(a)  p  =  0.95.  Histogram  peak  value  is  0.007. 
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(b)  p  =  0.99.  Histogram  peak  value  is  0.135. 
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(c)  p  =  0.999.  Histogram  peak  value  is  0.799. 


Figure  35.  Error  histograms  for  an  inexact  16-bit  Wallace  tree  multiplier,  with  inputs 
A  and  B  uniformly  distributed  from  0  to  2^  —  1,  for  various  values  of  p.  Results  are 
from  Matlab  PBL  simulation. 
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Skew(e)  Mcdian(e)  P{i  =  0) 


(a)  Probability  that  e  =  0 


(b)  Mean  of  e 


(c)  Median  of  e  (d)  Standard  deviation  of  e 


(e)  Skewness  of  e 


(f)  Kurtosis  of  e 


Figure  36.  Error  statistics  for  various  inexact  multipliers,  with  inputs  A  and  B  uni¬ 
formly  distributed  from  0  to  2^  —  1.  Results  are  from  Matlab  PBL  simulation. 
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Table  10.  Error  Statistics:  16-bit  Shift-and-Add  Multiplier  with  PBL 


p 

P(e  =  0) 

Mean 

Std  Dev 

Skewness 

Kurtosis 

0.8000 

0.0042 

0.0966 

0.2650 

0.3295 

4.04 

0.8500 

0.0047 

0.0878 

0.2478 

0.4303 

4.37 

0.9000 

0.0056 

0.0745 

0.2221 

0.5528 

5.07 

0.9500 

0.0177 

0.0481 

0.1733 

0.8228 

7.14 

0.9600 

0.0277 

0.0408 

0.1576 

0.9201 

8.09 

0.9700 

0.0513 

0.0327 

0.1389 

1.1144 

9.91 

0.9800 

0.1124 

0.0230 

0.1140 

1.3123 

13.2 

0.9900 

0.2901 

0.0126 

0.0842 

1.9383 

23.6 

0.9950 

0.5215 

0.0064 

0.0591 

2.8432 

42.4 

0.9960 

0.5915 

0.0052 

0.0529 

3.0746 

53.9 

0.9970 

0.6685 

0.0040 

0.0462 

3.7838 

71.1 

0.9980 

0.7645 

0.0025 

0.0374 

4.1237 

105 

0.9990 

0.8731 

0.0012 

0.0266 

5.7969 

203 

0.9995 

0.9330 

0.0007 

0.0186 

10.5104 

419 

0.9996 

0.9469 

0.0004 

0.0160 

9.7384 

537 

0.9997 

0.9591 

0.0004 

0.0146 

16.2839 

713 

0.9998 

0.9719 

0.0004 

0.0127 

22.1914 

928 

0.9999 

0.9859 

0.0001 

0.0082 

13.4407 

1965 

5.3  Comparisons  Among  Multiplier  Types 


Fig.  36  provides  summary  statistics  for  16,  32,  and  64-bit  shift-and-add  multipliers 
for  various  values  of  p  between  0.8000  and  0.9999.  Fig.  36a  shows  that  the  probability 
of  zero  error  {i  =  0)  increases  with  p  and  decreases  with  N.  Figs.  36b-c  show  that, 
for  the  Wallace  tree  with  p  <  0.98,  the  mean  and  median  of  i  are  much  larger  than 
for  the  shift  and  add  multiplier.  For  example,  for  a  64-bit  Wallace  tree  multiplier 
with  p  =  0.95,  the  mean  of  e  is  0.2208  and  the  median  is  0.2042,  while  for  the  64-bit 
shift-and-add  multiplier  the  mean  error  is  0.0464  and  the  median  is  0.0117.  This 
nonzero  mean  is  also  evident  in  Fig.  35a.  Fig.  36d  shows  that  the  standard  deviation 
of  d  decreases  with  p.  For  example,  for  the  64-bit  Wallace  tree  multiplier,  when 
p  =  0.95  the  standard  deviation  of  e  is  0.3174;  when  p  =  0.99  the  error  standard 
deviation  drops  to  0.1743.  From  Fig.  36d,  we  can  see  that  the  dispersion  of  error  is 
greater  for  the  Wallace  tree  than  for  the  shift  and  add  multiplier.  For  example,  when 
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p  =  0.90,  the  error  standard  deviation  is  0.3420  for  the  16-bit  Wallace  tree  multiplier 
and  0.3538  for  the  32  and  64-bit  Wallace  tree  multipliers,  but  only  0.2204  for  the  16, 
32,  and  64-bit  shift-and-add  multipliers.  Fig.  36e  shows  slight  skewness  for  small  p, 
which  greatly  increases  when  p  is  large.  Fig.  36f  shows  that  the  distribution  of  i  is 
exceedingly  kurtotic  for  large  values  of  p.  For  example,  when  p  >  0.998,  the  kurtosis 
of  i  for  the  16,  32,  and  64-bit  shift-and-add  multipliers  is  greater  than  100.  Figs. 
36b-f  show  that  the  bit  width  of  the  multiplier  has  very  little  effect  on  the  mean, 
median,  standard  deviation,  skewness,  and  kurtosis  of  the  error  distribution. 

We  considered  using  IEEE  754  standard  floating-point  adders  and  multipliers  for 
use  in  the  JPEG  compression  algorithm.  As  was  mentioned  in  Section  5.1.3,  the  initial 
results  were  not  promising  for  this  purpose,  due  to  large  errors  occurring  frequently, 
and  we  were  able  to  implement  the  JPEG  algorithm  using  integer  arithmetic.  The 
results  are  included  here  for  completeness.  When  viewing  multiplier  performance 
in  terms  of  the  product  error  metric  e,  floating-point  multiplier  overall  errors  vary 
tremendously  with  p,  as  shown  in  Figs.  37-38. 

5.4  JPEG  Image  Compression 

The  JPEG  compression  algorithm,  repeated  in  Fig.  39,  brings  together  the 
methodologies  described  througout  this  dissertation.  The  color  space  transform,  dis¬ 
crete  cosine  transform,  and  quantization  can  all  be  performed  using  inexact  adders 
and  multipliers.  In  this  work,  we  show  the  results  of  the  inexact  color  space  transfor¬ 
mation  and  inexact  DGT  on  the  performance  of  the  JPEG  algorithm. 

5.4.1  Inexact  Color  Space  Transform. 

This  section  illustrates  the  result  of  performing  an  inexact  GST  described  in  Sec¬ 
tion  4.6  and  shown  in  Fig.  39.  Performance  results  for  the  inexact  GST  are  shown  in 
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e 

(c)  p  =  0.9999. 


Figure  37.  Error  histograms  for  an  inexact  half-precision  floating-point  multiplier,  with 
inputs  A  and  B  uniformly  distributed  from  —100  to  +100,  for  various  values  of  p.  Each 
distribution  has  several  modes,  representing  the  following  cases:  (1)  e  =  1  meaning  no 
error;  (2)  powers  of  2,  meaning  a  single  bit  of  the  exponent  was  flipped  from  0  to  1;  (3) 
powers  of  meaning  a  single  bit  of  the  exponent  was  flipped  from  1  to  0;  (4)  e  close 
to  zero,  meaning  the  MSB  of  the  exponent  was  flipped  from  1  to  0;  and  (5)  e  =  —  1, 
meaning  the  sign  bit  was  flipped.  Results  are  from  Matlab  PBL  simulation. 
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(a)  p  =  0.99. 


Figure  38.  Error  histograms  for  an  inexact  single-precision  floating-point  multiplier, 
with  inputs  A  and  B  uniformly  distributed  from  —100  to  -1-100,  for  various  values  of  p. 
Plots  are  asymmetric  because  e  <  0  only  occurs  when  there  is  an  error  in  the  sign  bit 
of  the  output.  Results  are  from  Matlab  PBL  simulation. 
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Figure  39.  Block  diagram  of  the  JPEG  image  compression  algorithm.  In  this  disserta¬ 
tion,  the  JPEG  algorithm  is  a  motivational  example  for  inexact  computing.  The  shaded 
boxes  show  areas  where  inexact  methods  can  be  considered  (for  example,  in  adder  cir¬ 
cuits  and  multiplier  circuits).  The  white  boxes  show  areas  where  inexact  methods 
cannot  be  considered  (“keep-out  zones”). 


Fig.  40.  The  images  in  this  hgure  have  not  yet  undergone  compression.  The  hgure 
shows  how  the  image  degrades  as  p  decreases.  For  example,  when  p  =  0.999999  as 
in  Fig.  40b,  the  RMS  error  is  0.91  and  the  SNR  is  23.73  dB.  If  the  RMS  error  is 
normalized  relative  to  its  maximum  possible  value  of  255,  then  the  RMS  error  of  0.91 
normalizes  to  0.36%.  When  p  =  0.99  as  in  Fig.  40f,  the  RMS  error  degrades  to  13.85 
(5.4%  normalized  RMS  error)  with  an  SNR  of  11.91  dB. 

The  formula  for  the  CST  for  the  intensity  component  Y  of  a  single  pixel  is  given 
in  Eq.  (69).  This  equation  contains  three  multiplications  and  two  additions.  Using 
16-bit  multipliers  (Abi  =  8,  Nb  =  8)  with  Nmuit, exact  =  3,  followed  by  truncation  of 
the  lower  8  bits,  followed  by  8-bit  addition  with  Nadd, exact  =  3,  we  have  a  total  of: 

•  2x3  =  6  exact  one-bit  adders  from  the  two  addition  operations, 

•  2  X  5  =  10  inexact  one-bit  adders  from  the  two  addition  operations, 

•  3x3  =  9  exact  one-bit  adders  from  the  three  multiplication  operations  (see 


no 


(a)  Original  image. 


(c)  p  =  0.99999.  RMS  error  =  0.99, 
SNR  =  23.37  dB. 


(e)  p  =  0.999.  RMS  error  =  4.06, 
SNR  =  17.24  dB. 


(b)  p  =  0.999999,  RMS  error  =  0.91, 
SNR  =  23.73  dB. 


(d)  p  =  0.9999.  RMS  error  =  1.52, 
SNR  =  21.51  dB. 


(f)  p  =  0.99.  RMS  error  =  13.85, 
SNR  =  11.91  dB. 


Figure  40.  Uncompressed  bitmap  images  computed  using  an  inexact  color  space  trans¬ 
formation  with  various  values  of  p,  RMS  error,  and  signal-to-noise  ratio  (SNR).  Only 
the  intensity  component  Y  is  shown. 
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Table  7),  and 


•  3  X  53  =  159  inexact  one-bit  adders  from  the  three  multiplication  operations. 

From  the  above,  there  are  6  -|-  9  =  15  exact  one-bit  adders  and  10-1-159  =  169 
inexact  one-bit  adders  per  pixel  for  the  CST.  In  this  example,  a  contribution  of  this 
dissertation  is  that  we  show  =  91.8%  of  the  CST  can  be  built  from  inexact 

components.  Using  data  from  Fig.  30,  we  see  that  a  probability  of  correctness 
p  =  0.99  with  a  noise  PSD  of  3  x  10“^^  V^/Hz  gives  an  energy  savings  of  about  62%, 
and  p  =  0.999  gives  an  energy  savings  of  about  15%.  Employing  exact  computation 
on  the  three  most  signihcant  bits  and  using  Eq.  (107)  gives  the  energy  savings  and 
error  values  shown  in  Table  11. 

Table  11.  Energy  Savings  and  Errors  for  Inexact  Color  Space  Transformation 

Energy  Normalized 
p  Savings  %  RMS  Error  % 

0.99  57  5^ 

0.999  14  1.6 


5.4.2  Inexact  DCT. 

This  section  illustrates  the  result  of  using  an  inexact  DCT,  but  exact  color  space 
transformation  and  no  quantization.  An  inexact  JPEG  compression  algorithm  was 
simulated  as  described  in  Section  4.6  and  shown  in  Fig.  39.  Performance  results  for 
the  inexact  DCT  are  shown  in  Fig.  41.  This  hgure  shows  how  the  image  degrades  as 
p  decreases.  For  example,  when  p  =  0.999999  as  in  Fig.  41b,  the  RMS  error  is  2.29 
(0.90%  normalized)  and  the  SNR  is  19.73  dB.  When  p  =  0.99  as  in  Fig.  41f,  the  RMS 
error  degrades  to  50.51  (19.8%  normalized)  with  an  SNR  of  6.29  dB.  The  artifacts  in 
Fig.  41f  resemble  those  explained  in  Fig.  15  on  page  52;  this  is  the  result  of  errors 
in  the  DCT  algorithm.  The  compression  ratios  are  still  low  (1.63  at  best)  because 
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we  have  not  yet  done  qnantization  as  described  in  Section  3.6.4.  After  quantization, 
typical  compression  ratios  range  from  1.7  (using  a  quality  factor  q  =  100%)  up  to  22 
(using  a  quality  factor  q  =  25%)  [46,  p.  191].  The  compression  ratio  in  Fig.  41f  is 
degraded  down  to  0.72  due  to  the  introduction  of  artifacts  into  the  DCT  as  a  results 
of  errors  in  the  inexact  computation  of  the  DCT.  These  nonzero  artifacts  reduce  the 
sparsity  of  the  DCT  matrix,  resulting  in  a  longer  set  of  run-amplitude  encoded  data. 
With  proper  quantization  according  to  Equation  (73),  most  of  those  artifacts  would 
be  hltered  out. 

The  formula  for  the  discrete  cosine  transform  for  an  8  x  8  block  is  given  in  Eq.  (72). 
One  8x8  matrix  multiplication  contains  512  multiplications  and  448  additions,  and 
the  second  8x8  matrix  multiplication  contains  another  512  multiplications  and  448 
additions;  the  total  is  1024  multiplications  and  896  additions.  Using  16-bit  multipliers 
{Na  =  8,Nb  =  8)  with  Nmuit, exact  =  3,  and  16-bit  addition  with  Nadd, exact  =  3,  we 
have  a  total  of: 


•  896  X  3  =  2,  688  exact  one-bit  adders  from  the  896  addition  operations, 

•  896  X  13  =  11,648  inexact  one-bit  adders  from  the  896  addition  operations, 

•  1,  024  X  3  =  3, 072  exact  one-bit  adders  from  the  1,024  multiplication  operations 
(see  Table  7),  and 

•  1,  024  X  53  =  54,  272  inexact  one-bit  adders  from  the  1,024  multiplication  oper¬ 
ations. 


From  the  above,  there  are  2,  688  -|-  3,  072  =  5,  760  exact  one-bit  adders  and  11,  648  -|- 
54, 272  =  65, 920  inexact  one-bit  adders  per  block  for  the  discrete  cosine  trans¬ 
formation.  In  this  example,  a  contribution  of  this  dissertation  is  that  we  show 
65  920+^760  ~  92.0%  of  the  discrete  cosine  transform  can  be  built  from  inexact  compo¬ 
nents.  Using  data  from  Fig.  30,  we  see  that  a  probability  of  correctness  p  =  0.99  with 
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(a)  Uncompressed  image. 


(c)  p  =  0.99999.  RMS  error  =  2.80, 
CR  =  1.62,  SNR  =  18.86  dB. 


(e)  p  =  0.999.  RMS  error  =  17.87, 
CR  =  1.05,  SNR  =  10.80  dB. 


(b)  p  =  0.999999,  RMS  error  =  2.29, 
CR=  1.63,  SNR  =  19.73  dB. 


(d)  p  =  0.9999.  RMS  error  =  5.82, 
CR  =  1.47,  SNR  =  15.68  dB. 


(f)  p  =  0.99.  RMS  error  =  50.51,  CR 
=  0.72,  SNR  =  6.29  dB. 


Figure  41.  JPEG  images,  without  quantization,  computed  using  an  inexact  discrete 
cosine  transformation  with  various  values  of  p,  RMS  error.  Compression  Ratio  (CR), 
and  Signal-to-Noise  Ratio  (SNR).  Only  the  intensity  component  Y  is  shown. 
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a  noise  PSD  of  3  X  10  V^/Hz  gives  an  energy  savings  of  about  62%,  and  p  =  0.999 
gives  an  energy  savings  of  about  15%.  Employing  exact  computation  on  the  three 
most  signihcant  bits  and  using  (107)  gives  the  energy  savings  and  error  values  shown 
in  Table  12.  Note  that  the  relative  energy  savings  of  the  inexact  DCT  are  roughly 
the  same  as  the  relative  energy  savings  for  the  inexact  CST.  Future  work  should 
further  examine  optimization  of  the  DCT  for  energy  savings  via  inexact  computing. 
Future  research  will  also  examine  the  results  of  performing  inexact  quantization  on 
the  overall  performance  of  the  JPEG  compression  algorithm  shown  in  Fig.  39. 

Table  12.  Energy  Savings  and  Errors  for  Inexact  Discrete  Cosine  Transformation 

Energy  Normalized 
p  Savings  %  RMS  Error  % 

0.99  57  19)8 

0.999  14  7.0 


5.5  Remarks 

Although  RMS  error  is  a  useful  metric  of  image  quality,  it  is  very  simplistic. 
Models  of  human  perception  are  very  complex,  and  are  not  easily  summarized  by 
simple  metrics.  If  an  image  is  intended  for  human  consumption,  it  is  up  to  the 
human  observer  whether  or  not  the  image  quality  is  “good  enough” . 

The  JPEG  standard  has  been  around  since  the  early  1990s  [49].  Although  re¬ 
searchers  are  working  on  new  image  compression  algorithms,  the  legacy  JPEG  al¬ 
gorithm  is  well-understood  and  still  widely  used.  The  results  of  this  dissertation 
demonstrate  a  promising  approach  to  improving  other  image  compression  algorithms 
via  inexact  computing. 
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VI.  Discussion 


The  decision  flowchart  in  Fig.  1  provides  the  overall  framework  for  inexact  com¬ 
puting  design.  In  this  dissertation,  we  demonstrated  the  following  steps  in  the 
flowchart: 

•  Proflling  of  the  JPEG  algorithm  (Fig.  2), 

•  Deciding  which  sub-algorithms  are  amenable  to  inexact  design  (Sections  3.6.8 
and  4.7  and  Fig.  2), 

•  Choosing  the  right  amount  of  precision  (Section  4.6.1), 

•  Choosing  error  metrics  (Section  4.7), 

•  Using  noisy  inexactness  to  reduce  energy  and  area  (Sections  4. 1.1. 4,  4.3  and 
4.4),  and 

•  Comparing  the  inexact  sub-algorithm  design  vs.  the  exact  sub-algorithm  base¬ 
line  (Tables  11  and  12). 

Tables  11  and  12  illustrate  the  tradeoffs  made  within  the  space  of  inexact  computing 
design. 

With  sufficient  energy  reduction,  breakpoints  in  the  tradeoffs  begin  to  emerge. 
For  example,  consider  the  case  in  which  the  energy  reduction  with  the  use  of  inexact 
methods  is  reduced  to  1/4  of  the  initial  energy  consumption,  as  shown  in  Fig.  32, 
curve  (a),  with  73%  energy  reduction  and  a  7.5%  error  standard  deviation.  In  this 
case,  circuit  designers  can  pull  ahead  of  the  energy  need  even  when  three  adders  are 
used  (triple  module  redundancy).  In  such  cases,  there  are  “break  points”  when  energy 
savings  are  sufficient  to  justify  a  change  in  circuit  architecture,  and  the  contribution 
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of  this  dissertation  is  to  provide  data  to  decide  the  break  points  for  a  few  specihc 
adder  circuits  in  14  nm  FinFET  CMOS  technology  and  0.6  pm  CMOS  technology. 

This  dissertation  helps  designers  hnd  a  way  to  pull  ahead,  such  that  designers 
can  hnd  a  tradeoff  in  energy  that  gives  a  clear  decisive  drop  in  power  faster  than 
the  growth  in  area.  As  an  example,  consider  triple  module  redundancy  for  which  the 
breakpoint  is  approximately  67%,  not  including  the  small  area  overhead  in  the  voting 
circuitry;  thus,  designers  would  need  to  achieve  a  gain  of  approximately  four  in  order 
to  beat  an  area  (overhead)  occupied  by  three  modules. 

Note  that  designers  can  achieve  this  breakpoint  in  time  or  area.  In  time,  one  can 
just  take  three  consecutive  samples — which  is  very  attractive  for  circuits — designers 
can  slow  down  the  circuits,  take  more  samples,  and  vote;  the  more  one  can  slow  down 
the  circuit,  the  more  accurate  the  result  can  be,  and  then  one  could  achieve  perhaps  a 
20-fold  energy  reduction.  In  such  a  way,  designers  can  play  with  the  time  dimension 
a  lot  in  order  to  pull  ahead. 

Another  question  we  are  asking  is,  “What  is  the  best  a  designer  can  ever  do?” 
For  example,  a  broken  clock  is  really  energy  efficient  but  is  not  wrong  all  of  the  time; 
the  broken  clock  is  correct  twice  a  day.  We  are  trying  to  understand  if  it  is  possible 
to  make  a  clock  that  wiggles  a  bit;  in  such  a  case  we  would  expect  that  it  is  wrong 
less  of  the  time.  The  same  principle  applies  to  inexact  adders. 

Other  components,  such  as  barrel  shifters,  bit  counters,  multiplexers,  multipliers, 
floating-point  adders,  and  floating-point  multipliers  can  be  built  using  inexact  logic 
circuits.  Such  components  can  be  used  in  the  JPEG  compression  algorithm,  as  shown 
in  Fig.  2.  However,  in  this  work  we  used  only  integer  adders  and  multipliers,  and 
the  only  bit  shifting  we  did  was  simply  a  matter  of  truncating  some  of  the  least 
signihcant  bits.  The  purpose  of  the  compression  algorithm  is  to  reduce  the  size 
of  the  data  while  retaining  most  of  the  information  contained  therein.  The  hrst 
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step  in  the  algorithm  is  the  color  space  transformation.  In  this  step,  the  processor 
transforms  the  red,  bine,  and  green  image  components  into  a  Inminance  component 
and  two  chrominance  components.  This  is  a  linear  formula  consisting  of  addition  and 
multiplication,  and  could  be  accomplished  using  integer  adders  and  multipliers.  This 
is  a  possible  application  for  inexact  computing. 

The  second  step  of  the  JPEG  compression  algorithm,  tiling,  does  not  involve 
inexact  components;  it  consists  of  wiring  only.  In  this  step,  the  processor  arranges 
each  data  component  into  8x8  blocks  of  pixels.  This  is  simply  routing  of  data.  Under 
our  inexact  computing  model,  routing  can  be  accomplished  via  hard-wiring  without 
any  inexactness. 

In  the  third  step,  the  processor  performs  the  discrete  cosine  transformation  (DCT) 
on  each  8x8  block  of  data.  This  consists  of  two  8x8  matrix  multiplications  involving 
non-whole  signed  numbers.  To  handle  fractional  numbers,  we  chose  to  use  signed 
integer  arithmetic  instead  of  IEEE  754  standard  floating  point  numbers,  because 
initial  results  indicate  integer  arithmetic  produced  lower  errors.  This  signed  integer 
arithmetic  was  optimized  by  the  use  of  various  p-values,  exact  computation  on  the 
three  most  significant  bits,  and  precision  limited  to  only  the  number  of  bits  needed, 
as  described  in  Section  4.6.  Future  research  will  further  examine  optimization  of  the 
DCT  for  energy  and  EDP  improvement  via  inexact  computing. 

The  fourth  step  of  the  compression  algorithm,  quantization,  consists  of  dividing 
(or  multiplying)  each  DCT  output  by  a  constant  value.  The  purpose  of  quantization 
is  to  reduce  the  DCT  data  into  a  sparse  matrix  consisting  of  mostly  zeros.  This  could 
also  be  performed  using  inexact  multipliers. 

The  final  three  steps  in  the  compression  algorithm  are  not  applications  for  inexact 
computing.  Step  five  is  zigzagging,  which  is  routing  of  data,  and  can  be  accomplished 
without  inexactness.  Step  six,  which  is  run-amplitude  encoding,  and  step  seven. 
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which  is  Huffman  encoding,  are  state  machine  applications,  which  are  not  tolerant  of 
error. 

The  results  shown  in  this  dissertation  provide  a  promising  approach  to  continue 
improvements  on  the  energy  and  EDP  of  the  JPEG  compression  algorithm  via  inex¬ 
act  computing,  at  the  expense  of  tradeoffs  in  image  quality  and  compression  ratio. 
We  have  demonstrated  the  inexact  color  space  transformation  and  DCT;  the  other 
component,  quantization,  is  reserved  for  future  work.  Future  work  will  also  consider 
the  increases  in  energy  consumption  caused  by  degradation  of  the  compression  ra¬ 
tio.  Also,  future  research  will  utilize  inexact  techniques  to  optimize  JPEG  image 
compression  hardware  for  reduced  energy  consumption  and  reduced  EDP  via  inexact 
computing. 
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VII.  Summary  and  Conclusions 


The  past  five  decades  have  seen  an  insatiable  demand  for  computation  supported 
by  the  success  of  Moore’s  Law,  and  concerns  about  limits  to  power  reduction  that  can 
be  realized  with  traditional  digital  design  and  device  scaling  are  being  raised.  At  the 
same  time,  a  need  exists  to  build  a  fault  tolerant  semiconductor  chip  that  can  han¬ 
dle  failure  gracefully.  These  trends  provide  motivation  for  researchers  to  investigate 
areas  such  as  probabilistic  computing  and  inexact  methods  that  offer  potential  im¬ 
provement  in  energy  (savings),  performance  (improvement),  and  area  (improvement). 
Probabilistic  computing  offers  a  potential  capability  to  extract  functionality  that  is 
‘good  enough’  while  operating  with  lower  power  dissipation. 

This  dissertation  presents  a  method  to  quantify  the  energy  savings  resulting  from 
the  decision  to  accept  a  specified  percentage  of  error  in  some  components  of  a  com¬ 
puting  system.  With  the  JPEG  algorithm,  loss  is  tolerated  in  certain  components 
(e.g.  color  space  transformation  and  discrete  cosine  transform)  that  contain  adder 
circuits.  The  contribution  of  this  dissertation  is  to  provide  energy-accuracy  tradeoffs 
for  a  few  inexact  adder  architectures  in  14  nm  FinFET  CMOS  technology. 

7.1  Adders 

This  dissertation  investigated  the  susceptibility  to  noise  of  some  digital  adder 
circuits  that  are  deliberately  engineered  to  be  imprecise.  The  adders  are  characterized 
with  probabilistic  Boolean  logic  which  provides  the  capability  to  characterize  random 
noise  in  digital  circuits.  In  this  study,  each  binary  logic  gate  was  assigned  a  probability 
p  of  correctness  between  0.8000  and  0.9999. 

The  contribution  of  this  dissertation  is  to  provide  quantitative  data  providing 
energy- accuracy  tradeoffs  for  8-bit  and  16-bit  ripple  carry  adders  and  8-bit  and  16- 
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bit  Kogge-Stone  adders  in  14  nm  FinFET  CMOS  technology  as  functions  of  four  levels 
of  noise  power  spectral  density  introduced  in  the  circuits.  Error  histograms,  standard 
deviation,  kurtosis,  and  probability  of  zero  error  are  reported.  The  power  supply 
voltage  takes  on  values  in  the  range  of  0.3  V  to  0.8  V  in  14  nm  FinFET  CMOS 
technology.  The  main  results  of  this  dissertation  show  that  the  energy  reduction 
can  take  on  the  largest  value  of  92%  with  an  error  rate  of  0.1  (where  the  noise 
power  spectral  density  takes  on  a  value  of  3  x  10“^^  V^/Hz).  Second,  results  show 
that  an  energy  reduction  of  95%  can  be  achieved  with  a  standard  deviation  of  a 
normalized  output  error  of  0.18  (again,  where  the  noise  power  spectral  density  takes 
on  a  value  of  3  x  10“^^  V^/Hz).  The  results  also  show  that  for  Vdd=0.6  V  and 
p=0.91,  energy  consumption  can  be  reduced  by  50%  compared  with  Vdd  =  0.8  V, 
with  a  normalized  error  standard  deviation  of  0.2,  and  a  reduction  of  30%  in  the 
energy-delay  product.  When  Vdd  is  further  reduced  to  Vdd  =  0.5  V  with  p  =  0.87, 
the  error  standard  deviation  is  0.25,  and  energy  consumption  is  reduced  by  65%,  and 
energy-delay  product  is  reduced  by  40%.  Results  show  that  the  energy-delay  product 
is  minimized  when  the  normalized  error  standard  deviation  takes  on  a  value  between 
0.25  and  0.32.  As  error  increases  beyond  this  point,  the  increase  in  delay  exceeds  the 
reduction  in  energy,  and  so  the  EDP  starts  to  increase  again. 

7.2  Multipliers 

This  dissertation  presents  a  methodology  for  inexact  multipliers,  including  shift 
and  add,  Baugh- Wooley,  and  Wallace  tree  multipliers.  Probabilistic  Boolean  logic 
simulations  of  these  multipliers  and  the  associated  error  are  presented  with  Mat- 
lab.  Results  show  that  for  a  probability  of  correctness  of  0.999,  the  normalized  error 
standard  deviation  achieved  in  an  8-bit  shift-and-add  multiplier  is  2.66%.  For  the 
shift-and-add  multiplier,  a  methodology  was  presented  to  use  exact  technology  to 
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compute  the  most  significant  bits,  and  inexact  technology  to  compute  the  least  sig¬ 
nificant  bits,  and  an  example  was  provided.  An  expression  to  calculate  the  energy 
per  cycle  by  an  exact  shift-and-add  multiplier  was  discussed. 

The  results  provided  by  the  Matlab  simulations  of  an  8-bit  noisy  shift-and-add 
multiplier  and  an  8-bit  Wallace  tree  multiplier  show  that  as  the  probability  of  correct¬ 
ness  p  takes  on  a  larger  value  (increases),  the  standard  deviation  in  the  error  decreases 
in  both  multipliers,  as  shown  in  Fig.  34(a)-(c)  and  Fig.  35(a)-(c).  Similar  results  are 
also  summarized  for  16-bit  shift-and-add  multipliers  and  16-bit  Wallace  tree  multipli¬ 
ers  and  32-bit  shift-and-add  multipliers  and  32-bit  Wallace  tree  multipliers,  as  shown 
in  Fig.  36.  Detailed  error  statistics  are  summarized  in  Table  10. 

7.3  JPEG 

This  dissertation  presents  a  methodology  for  the  JPEG  compression  algorithm. 
Uncompressed  TIFF  files  from  the  University  of  Southern  California  Signal  and  Image 
Processing  Institnte  (SIPI)  are  evalnated  with  the  methodology  presented  in  this 
dissertation.  This  dissertation  uses  integer  arithmetic  and  Matlab  simulations  to 
carry  out  the  inexact  JPEG  algorithm  (See  green  boxes  and  pink  box  in  the  flowchart 
in  Fig.  39). 

The  JPEG  algorithm  is  composed  of  the  following  steps:  color  space  transfor¬ 
mation,  tiling,  discrete  cosine  transform,  qnantization,  zigzagging,  rnn-amplitude  en¬ 
coding,  and  Huffman  encoding.  As  discussed  in  the  dissertation,  inexact  compnting 
(inexactness)  can  be  tolerated  in  the  first,  third,  and  fourth,  steps  of  the  JPEG 
algorithm,  namely  color  space  transformation,  discrete  cosine  transformation,  and 
quantization. 

This  dissertation  presents  an  inexact  approach  to  the  JPEG  algorithm  throngh 
incorporation  of  inexact  adders  and  inexact  multipliers  in  the  color  space  transfor- 
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mation  step  and  the  discrete  cosine  transformation  step  (green  boxes  in  Fig.  39).  In 
this  methodology,  exact  computation  is  used  on  the  three  most  signihcant  bits  of  each 
addition  and  multiplication  operation  in  the  color  space  transformation  step  and  in 
the  discrete  cosine  transformation  step.  Exact  methods  are  used  in  the  the  quan¬ 
tization  step  for  simplicity,  even  though  it  is  recognized  that  this  step  can  tolerate 
inexact  methods.  Fig.  39  summarizes  the  JPEG  algorithm.  Note  that  floating  point 
arithmetic  or  integer  arithmetic  can  be  used  in  the  methodology. 

The  results  obtained  in  this  dissertation  show  that  a  signal-to-noise  ratio  of  15.68 
dB  and  RMS  error  of  5.82  can  be  achieved  with  a  probability  of  correctness  of  99.99%, 
as  shown  in  Fig.  41(d).  The  results  also  show  that  a  signal-to-noise  ratio  of  6.29  dB 
and  RMS  error  of  50.51  can  be  achieved  with  a  probability  of  correctness  of  99%,  as 
shown  in  Figure  41(f). 

We  recognize  that  a  fully  inexact  JPEG  algorithm  should  take  advantage  of  inexact 
methods  at  all  steps.  Therefore  future  work  should  consider  the  incorporation  of 
inexact  methods  in  the  quantization  step. 

7.4  Contributions 

Using  inexact  design  methods,  we  have  shown  that  we  could  cut  energy  demand 
in  half  with  16-bit  Kogge-Stone  adders  that  deviated  from  the  correct  value  by  an 
average  of  3.0  percent  in  14  nm  CMOS  FinFET  technology,  assuming  a  noise  ampli¬ 
tude  of  3  X  10“^^  V^/Hz  (see  Fig.  32).  This  was  achieved  by  reducing  Vdd  to  0.6  V 
instead  of  its  maximum  value  of  0.8  V.  The  energy-delay  product  (EDP)  was  reduced 
by  38  percent  (see  Fig.  33). 

Adders  that  got  wrong  answers  with  a  larger  deviation  of  about  7.5  percent  (using 
Vdd  =  0.5  V)  were  up  to  3.7  times  more  energy-efficient,  and  the  EDP  was  reduced 
by  45  percent. 
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Adders  that  got  wrong  answers  with  a  larger  deviation  of  about  19  percent  (using 
Vdd  =  0.3  V)  were  up  to  13  times  more  energy-efficient,  and  the  EDP  was  reduced 
by  35  percent. 

We  used  inexact  adders  and  inexact  multipliers  to  perform  the  color  space  trans¬ 
form,  and  found  that  with  a  1  percent  probability  of  error  at  each  logic  gate,  the 
letters  “F-16”,  which  are  14  pixels  tall,  and  “U.S.  AIR  FORCE”,  which  are  8  to  10 
pixels  tall,  are  readable  in  the  processed  image,  as  shown  in  Fig.  40f,  where  the 
relative  RMS  error  is  5.4  percent. 

We  used  inexact  adders  and  inexact  multipliers  to  perform  the  discrete  cosine 
transform,  and  found  that  with  a  1  percent  probability  of  error  at  each  logic  gate, 
the  letters  “F-16”,  which  are  14  pixels  tall,  and  “U.S.  AIR  FORCE”,  which  are  8  to 
10  pixels  tall,  are  readable  in  the  processed  image,  as  shown  in  Fig.  41f,  where  the 
relative  RMS  error  is  20  percent. 

This  dissertation  demonstrates  the  implementation  of  a  complex  algorithm  using 
inexact  design  methods.  In  this  demonstration,  inexactness  is  the  result  of  noise, 
crosstalk,  RF  interference,  cross-chip  variations,  or  other  imperfections  which  affect 
circuit  performance  in  a  probabilistic  manner.  These  results  show  savings  of  energy, 
delay,  and  area  by  continuing  device  scaling  with  hardware  technologies  which  are 
less  than  perfectly  reliable.  Future  research  will  include  fabrication  of  complex  sys¬ 
tems  using  such  unreliable  technologies,  and  further  development  of  inexact  design 
methodologies. 
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Appendix  A.  Inexact  Integer  Adders 


Ripple-Carry  Adder 

function  [S ,  Gout,  SO]  =  Adder _RCA_ inexact (n ,  A,  B,  Gin,  p 
,  bit,  msbhalf adder ) 

7oAdder_RGA_inexact  :  Adds  inputs  A,  B,  and  Gin,  simulating 

a  ripple-carry 

% adder ,  except  that  each  AMD,  DR,  and  A021  (and-or)  gate 
has  a  random 

7oerror  probability  equal  to  1-p. 

7o 

7o Inputs  : 

7on :  (positive  integer)  Number  of  bits  processed  by  the 

adder  . 

7oA ,  B:  (n-bit  integer  arrays)  Input  arguments  for  the 

adder . 

7o  If  B  is  a  scalar  ,  then  B  is  treated  as  a  constant  , 
allowing  for  a 

7o  simplified  hardware  implementation,  which  results  in 
less  inexactness . 

7oGin :  (logical  array)  Garry-in  input  for  the  adder. 

7oP :  (scalar)  Probability  of  correctness  of  each  AND,  XOR , 

and  A021  gate 

7o  inside  the  adder.  0  <=  p  <=  1. 

7obit  :  (integer  vector)  Which  bit  positions  can  be  inexact 

Position  1  is 

7o  lowest-order  bit.  (optional)  If  bit  is  omitted,  then 
all  positions 

7o  can  be  inexact  . 

7omsbhalf adder  :  Assumes  the  most  significant  bit  (MSB)  of 

B  is  always  0, 

%  and  uses  a  half -adder  to  add  the  MSB  of  A  and  to  the 
carry . 

I 

7o Outputs  : 

7oSO  :  (2*n-bit  integer  array)  Sum  of  A,  B,  and  Gin, 
including  carry-out  bit. 

7oS  :  (n-bit  integer  array)  Lower  n  bits  of  SO,  excluding 

carry  -  out  bit  . 

7oGout  :  (logical  array)  Garry-out  bit. 

% 

7oRef  erences  : 

7oN.  H.  E.  Weste  and  D.  M.  Harris,  GMOS  VLSI  Design,  4th  ed 

•  i 

7oBoston  :  Addison -Wesley  ,  2011,  p.  449. 
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"/oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 
Boolean  logic  and 

%its  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 


Department  of 

7oComputer  Science,  Jun  2008. 

switch  class (A) 
case  ' int8  ' 


na  =  8 ; 

case  ' uint8  ’ 

s ignedA 

na  =  8 ; 

case  ' int 16  ’ 

s ignedA 

na  =  16 ; 

case  ' uint 16  ' 

s ignedA 

na  =  16 ; 

case  ' int32  ’ 

s ignedA 

na  =  32 ; 

case  ’ uint32  ' 

s ignedA 

na  =  32 ; 

case  ' int64  ’ 

s ignedA 

na  =  64 ; 

case  ' uint64  ' 

s ignedA 

na  =  64 ; 

s ignedA 

otherwise 

error  ’  Addends  uniust 


end 

A  =  unsigned (A) ; 

signA  =  logical (bitget  (A  ,  n 


switch  class (B) 
case  ' int8  ’ 

nb  =  8 ; 

case  ' uint8  ’ 

s ignedB 

nb  =  8 ; 

case  ' int 16  ’ 

s ignedB 

nb  =  16; 

case  ’ uint 16  ' 

s ignedB 

nb  =  16; 

case  ' int32  ’ 

s ignedB 

nb  =  32 ; 

case  ' uint32 ' 

s ignedB 

nb  =  32 ; 

case  ' int64  ’ 

s ignedB 

nb  =  64 ; 

case  ' uint64  ' 

s ignedB 

true  ; 
false ; 

=  true ; 

=  false ; 

=  true ; 

=  false ; 

=  true ; 

=  false ; 

beuof  utliSu integer u classes  .  ’ 


true  ; 
false ; 

=  true ; 

=  false ; 
=  true ; 

=  false ; 
=  true ; 
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nb  =  64;  signedB  =  false; 

otherwise 

error  ’  Addends uniust  ubeu of  uthsu integer u classes 

end 

B  =  unsigned (B) ; 

signB  =  logical (bitget (B , n) ) ; 


if  n  <=  8 

classname  = 

’ uintS ’ ; 

elseif  n  <=  16 

classname  = 

’ uint 16  ’  ; 

elseif  n  <=  32 

classname  = 

’ uint32  ’  ; 

else 

classname  = 

’ uint64  ’  ; 

end 

if  ~  exist (’ Gin  n 
Gin  =  []  ; 

end 

’ var  ’  ) 

if  ~ exist (’ p var  ’  ) 


p  =  1; 

end 

if  ~  exist (’ bit  n  ■ 

bit  =  1  :  n ; 

’ var  ’  ) 

end 

if  ~ exist (’ msbhalf adder var  ’  ) 


msbhalf  adder 

=  false ; 

end 

if  isempty (bit ) 

minbit  =  Inf ; 

> 

maxbit  =  Inf ; 

> 

else 

minbit  =  min (bit  (  : ) ) ; 
maxbit  =  max(bit(:)); 

end 

constantadder  =  isscalar(B); 

S  =  zeros ( size  (A)  , classname ) 
Gout  =  false (size  (A) )  ; 
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7o  Initial  propagate/generate  stage, 
if  constantadder 

%  If  we  are  adding  only  a  constant  B,  we  can  use  fewer 
components  , 

7  and  therefore  less  inexactness.  For  each  bit: 

7  A  AND  1  =  A,  and  can  be  omitted  altogether  from 

hardware . 

7  A  AMD  0  =  0. 

7  A  XOR  0  =  A,  and  can  be  omitted. 

7  A  XOR  1  can  be  hard-wired  as  an  inverter. 


AandB  =  bitand(A,  B) ; 
AxorB  =  bitxor(A,  B)  ; 


7  generate 
7  propagate 


7  Introduce  errors  only  into  the  bits  of  B  which  are 
set  high . 

err  =  bitand(B,  biterrors ( size ( AxorB )  ,  p,  classname  , 
bit ) )  ; 

AxorB  =  bitxor (AxorB  ,  err); 
clear  err 


else  7 

AandB 

AxorB 


end 


A  and  B  are  both  variables 
bitand_inexact (A ,  B,  p,  classname, 
7  generate 

bitxor_inexact  (A ,  B,  p,  classname, 
7  propagate 


bit )  ; 
bit )  ; 


cols  =  1  :  n ; 

cols  =  repmat(cols,  [numel (A) ,  1]); 

Sbits  =  f alse ( size ( cols ))  ; 
if  isempty (Cin) 

Gbits  =  [false (numel (A)  ,  1)  ,  Sbits]; 
elseif  isscalar (Cin) 

Gbits  =  [repmat (logical (Cin)  ,  [numel (A),l]),  Sbits]; 

else 

Gbits  =  [logical (Cin  (:))  ,  Sbits]; 

end 

Bbits  =  Sbits ; 

AandBbits  =  Sbits ; 

AxorBbits  =  Sbits ; 

AxorBandCbit s  =  Sbits ; 

AandBbits  (:)  =  bitget (repmat (AandB(:)  ,  [l,n])  ,cols)  ; 

AxorBbits  (:)  =  bitget (repmat (AxorB(:)  ,  [l,n])  ,cols)  ; 

if  isscalar (B) 
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Bbits(:)  =  bitget(repmat(B(:),  [numel (A)  ,n])  ,cols)  ; 

else 

Bbits(:)  =  bitget(repmat(B(:),[l,n]),cols); 

end 

for  j  =  1  :  n 

k  =  find(j  ==  bit,  1,  ’first’); 
if  (j==minbit)  &&  isempty(Cin) 
adder  on  lowest  bit 
Sbits(:,j)  =  AxorBbit s  ( :  , j )  ; 

Cbits(:,j+1)  =  AandBbit s ( : , j ) ; 
elseif  (j=  =  niaxbit)  &&  msbhalf adder 
adder  on  highest  bit 
Amsb  =  logical (bitget (A (:), maxbit )) ; 

Sbits(:,j)  =  xor(Amsb,  Cbits(:,j)); 

Cbits(:,j+1)  =  Amsb  &  Cbits(:,j); 
if  k 

i  =  (rand (numel (A),l)  >  p); 

Sbits(i,j)  =  ~Sbits(i,j); 
i  =  (rand (numel (A),l)  >  p); 

Cbits(i,j+1)  =  ~ Gbits ( i , j +1 ) ; 

end 

else 

adder 

Sbits(:,j)  =  xor ( AxorBbit s (:, j ) ,  Cbits(:,j)); 
if  k 

i  =  (rand (numel (A),l)  >  p); 

Sbits(i,j)  =  ~Sbits(i,j); 

end 

AxorBandCbit s  ( :  , j )  =  AxorBbits ( :  , j )  &  Cbits(:,j); 
if  k 

i  =  (rand (numel (A),l)  >  p); 

AxorBandCbits  (i  ,  j )  =  ~ AxorBandCbits  (i  ,  j )  ; 

end 

Cbits(:,j+1)  =  AxorBandCbit s (:, j )  |  AandBbits ( : , j ) 

> 

if  k 

i  =  (rand (numel (A) ,1)  >  p)  &  (~ constantadder  | 

Bbits  ( :  ,  j ) )  ; 

%If  B  is  const  ,  the  OR  gate  is  only  needed 
for  hi  bits  of  B 
Cbits(i,j+1)  =  ~ Gbits ( i , j +1 ) ; 

end 

end 

end 


"/.  half 


I  half 


"/o  full 


129 


196 

197 

198 

199 

200 

201 

202 

203 

204 

205 

206 

207 

208 

209 

210 

211 

212 

213 

214 

215 

216 

217 

218 

219 

220 

221 

222 

223 

224 

225 

226 

227 

228 

229 

230 

231 

232 


Gout  ( :  )  =  Gbits (:, n  +  1 )  ; 

twos  =  cast (pow2  (0  :  (n-1) ),’ like S)  ; 

S(:)  =  sum  ( cast ( Sbits like S )  .*  twos (ones (numel  (S)  ,  1) 

, : ) , 2 , ' native ’ ) ; 

if  n  <=  7 

SO  =  uintS (S) ; 
elseif  n  <=  15 

SO  =  uint 1 6 ( S ) ; 
elseif  n  <=  31 

SO  =  uint32 ( S ) ; 
elseif  n  <=  63 

SO  =  uint64 (S) ; 

else 

SO  =  double (S)  +  double ( pow2 (n)  *  Gout); 

end 

if  signedA  | |  signedB 

signout  =  logical (bitget (S , n) ) ; 

%  overflow  =  (~signA  |  ~signout)  &  (signA  |  ~signB)  &  ( 
signB  I  signout); 

OxFFFF  =  intmax ( class  (S) )  ; 

OxFOOO  =  bit shif t ( OxFFFF , n) ; 

S(signout)  =  bitor (S ( signout ), OxFOOO ) ; 

S  =  signed ( S )  ; 

if  n  <=  63 

Gout2  =  xor ( Gout  , xor ( signA , signB ))  ; 

OxFFFFFFFF  =  intmax ( class ( SO )) ; 

OxFOOOOOOO  =  bitshift (OxFFFFFFFF ,n) ; 

S0(Gout2)  =  bitor ( SO ( Gout2 ), OxFOOOOOOO ) ; 

SO  =  signed ( SO )  ; 

else 

SO(signout)  =  -SO ( signout ) ; 

end 

elseif  n  <=  63 

SO  =  bitset (SO , n+1 , Gout ) ; 

end 
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1.2  Kogge-Stone  Adder 


1  function  [  S,  Gout,  SO  ]  =  kogge_stone_inexact_PBL (  n,  A, 

B,  Gin,  p  ) 

2  7okogge_stone_inexact_PBL  :  Adds  inputs  A,  B,  and  Gin, 

simulating  a  Kogge - 

3  "/oStone  adder,  except  that  each  AND,  XDR  ,  and  A021  (and-or) 

gate  has  a 

4  %random  error  probability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  "/oU :  (positive  integer)  Number  of  bits  processed  by  the 

adder  . 

8  7oA ,  B:  (n-bit  integer  arrays)  Input  arguments  for  the 

adder  . 

9  7o  A,  B,  and  Gin  must  all  have  the  same  dimensions. 

10  7oGin :  (logical  array)  Garry-in  input  for  the  adder. 

11  7oP :  (scalar)  Probability  of  correctness  of  each  AND,  XOR , 

and  A021  gate 

12  7o  inside  the  adder.  0  <=  p  <=  1. 

13  7o 

14  7o Outputs  : 

15  7oSO  :  (2*n-bit  integer  array)  Sum  of  A,  B,  and  Gin, 

including  carry-out  bit. 

16  7oS  :  (n-bit  integer  array)  Lower  n  bits  of  SO,  excluding 

carry  -  out  bit  . 

17  7oGout  :  (logical  array)  Garry-out  bit. 

18  % 

19  7oRef  erences  : 

20  7oN.  H.  E.  Weste  and  D.  M.  Harris,  GMOS  VLSI  Design,  4th  ed 

•  9 

21  7oBoston  :  Addison -Wesley  ,  2011,  p.  449. 

22  7o 

23  7oL .  N.  B.  Ghakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

24  7oits  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 

Department  of 

25  7oGomputer  Science,  Jun  2008. 

26 

27  if  n  <=  7 

28  classnameO 

29  classname 

30  elseif  n  <=  8 

31  classnameO 

32  classname 

33  elseif  n  <=  15 


=  ’ uint8  ’  ; 
=  ’ uint8  ’  ; 

=  ' uint8  ’  ; 
=  ’ uint 16  ’  ; 
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54 

55 

56 

57 

58 

59 

60 
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62 
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classnameO  = 
classname  = 
elseif  n  <=  16 

classnameO  = 
classname  = 
elseif  n  <=  31 

classnameO  = 
classname  = 
elseif  n  <=  32 

classnameO  = 
classname  = 

else 

classnameO  = 
classname  = 

end 


’ uint 16  ’  ; 
’ uint 16  ’  ; 

' uint 16’; 
’ uint32  ’  ; 

’ uint32  ’  ; 
’ uint32  ’  ; 

’ uint32  ’  ; 
’ uint64  ’  ; 

’ uint64  ’  ; 
’ uint64  ’  ; 


if  nargin  <  5 
P  =  1; 

end 

if  (nargin  <  4)  | |  isempty(Cin) 

c  =  0 ; 

Gin  =  zeros ( classname ) ; 

else 

c  =  1; 

CinO  =  logical (Gin) ; 

Gin  =  zer os ( s ize ( Gin ), classname ) ; 
Gin  (  :  )  =  GinO  ; 

end 


logn  =  log2 (n)  +  c  +  1; 

SO  =  zeros([max( [numel (A)  ,  numel (B)]),  1],  classname); 

P  =  zeros ( [max ( [numel (A)  ,  numel(B)]),  logn],  classname); 
G  =  P; 

P(:,l)  =  bitxor_inexact  (A  (  : )  ,  B(:),  p,  classnameO,  l:n); 
G(:,l)  =  bitand_inexact(A(:),  B(:),  p,  classnameO,  l:n); 
if  c 

P(:,l)  =  bitshift  (P  (  :  ,  1)  ,  1); 

G(:,l)  =  bitor (bitshift (G  (:,  1)  ,  1),  Gin  (  : ) )  ; 

end 


i2  =  zeros ( classname ) ; 
for  i  =  2  :  logn 

il  =  pow2 (i -2)  ; 
i2 ( : )  =  pow2 ( il ) -1 ; 
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i2c  =  bitcmp0(i2,  n+c) ; 

P(:  ,i)  =  bit and_ inexact ( P ( :  ,i-l)  ,  bitshift(P(:  ,i-l)  ,il 
),  p,  classname,  (2*il+l):(n+c+l)); 

P(:,i)  =  bit or ( bit and (P ( :  , i )  ,  i2c )  ,  bitand (P  ( :  , i-1)  ,  i2) 

)  ; 

G(:,i)  =  A021_inexact (G ( :  , i-1)  ,  P(:,i-1),  bitshift (G 
(:,i-l),il),  p,  classname,  (il+l):(n+c+l)); 

G(:,i)  =  bit or ( bit and ( G  ( :  , i )  ,  i2c )  ,  bitand (G  (:, i-1)  ,  i2) 

)  ; 

end 

S0(:)  =  bitxor_inexact(P(:,l),  bitshift(G(:,logn),l),  p, 
classname,  2:(n+c+l)); 

if  c 

S0(:)  =  bitshift (SO  (:)  ,  -1); 

end 

if  isscalar (A) 


SO  = 

reshape  ( SO  , 

size (B) )  ; 

else 

SO  = 

reshape  (SO  , 

size (A) )  ; 

end 

intmaxl  = 

zeros (classname) ; 

intmax 1 ( : 

)  =  pow2 (n) 

-  1; 

S  =  zeros 

( s ize ( SO )  , 

classnameO ) 

S(:)  =  bitand(S0,  intmaxl); 

Gout  =  f alse ( size ( SO ) )  ; 

Gout  (  :  )  =  bitget (SO  ,  n  +  1); 
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Ling  Radix-4  Carry-Lookahead  Adder 


function  [  S,  Gout,  SO  ]  =  ling_adder_inexact_PBL (  n.  A,  B 

,  Gin ,  p  ) 

7oling_adder  :  Adds  inputs  A,  B,  and  Gin,  simulating  a 

valency -4 

"/ocarry  lookahead  adder  using  the  Ling  technique  . 

7o 

%  Inputs : 

7on :  (positive  integer)  Number  of  bits  processed  by  the 

adder  . 

7oA ,  B:  (n-bit  integer  arrays)  Input  arguments  for  the 

adder  . 

7o  A,  B,  and  Gin  must  all  have  the  same  dimensions. 

7oGin :  (logical  array)  Garry-in  input  for  the  adder. 

7oP :  (scalar)  Probability  of  correctness  of  each  AND,  OR, 

XOR,  and  A0A02111 

7o  inside  the  adder.  0  <=  p  <=  1- 

% 

7o Outputs  : 

7oSO  :  (2=t=n-bit  integer  array)  Sum  of  A,  B,  and  Gin, 
including  carry-out  bit. 

7oS  :  (n-bit  integer  array)  Lower  n  bits  of  SO,  excluding 

carry  -  out  bit  . 

7oGout  :  (logical  array)  Garry-out  bit. 

% 

7oRef  er ence  : 

7oL .  N.  B.  Ghakrapani  and  K.  V.  Palem  ,  "A  probabilistic 
Boolean  logic  and 

7oits  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice  University, 
Department  of 

%Gomputer  Science,  Jun  2008. 

switch  class (A) 

case  { ’ int8 ’ ,  ’uint8'} 
na  =  8 ; 

case  {’intl6',  ’uintl6’} 
na  =  16 ; 

case  { ’ int32 ' ,  ’uint32’} 

na  =  32; 

case  { ’ int64  '  ,  ’uint64’} 

error  '  64-bituiiotuSupported  .  ’ 
otherwise 

error  ’ Addends umust ubeu of utLsu integer u classes . ’ 

end 
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uintS  '  } 


switch  class (B) 

case  { ’ intS ’  , 
nb  =  8 ; 
case  {’intl6',  ’uintl6’} 
nb  =  16; 

case  { ’ int32 ' ,  ’uint32’} 

nb  =  32; 

case  { ’ int64 ' ,  ’uint64’} 

error  ’  64-bituiiotuSupported  .  ’ 
otherwise 

error  ^  Addends  uniust  ubeu of  uthsu  integer  u  classes  .  ’ 

end 

if  n  <=  7 

classnameO  =  'uintS’; 
classname  =  ’uintS'; 
elseif  n  <=  8 

classnameO  =  ’uint8’; 
classname  =  ’uintl6’; 
elseif  n  <=  15 

classnameO  =  'uintlG'; 
classname  =  ’uintl6’; 
elseif  n  <=  16 

classnameO  =  ’uintl6’; 
classname  =  ’uint32’; 

elseif  n  <=  31 

classnameO  =  ’uint32’; 

classname  =  ’uint32’; 

elseif  n  <=  32 

classnameO  =  ’uint32’; 

classname  =  ’uint64’; 

else 

classnameO  =  'uint64’; 
classname  =  ’uint64’; 

end 

if  nargin  <  5 
P  =  1; 

end 

if  (nargin  <  4)  | |  isempty(Cin) 

c  =  0 ; 

Gin  =  zeros ( classname ) ; 

else 

c  =  1; 

CinO  =  logical (Gin) ; 


135 


82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

no 

111 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 

126 


Cin  =  zeros ( size ( Cin ), classname ) ; 
Cin  ( :  )  =  CinO  ; 

end 


L  =  ceil  (n  /  4)  ; 

SO  =  zeros([max( [numel (A) ,  numel (B)]),  1],  classname); 

H  =  zeros([max([numel(A),  numel (B)]),  2],  classname); 

P  =  zeros([max([numel(A),  numel (B)]),  1],  classname); 


classnameO ,  l:n); 

classnameO  ,  l:n); 

classnameO 


I 

=  P; 

P( 

,1) 

=  bitxor_inexact ( A  ( 

:)  ,  B(:) 

I( 

,1) 

=  bitor_inexact (A ( : 

)  ,  B(:)  , 

H( 

,1) 

=  bitand_inexact ( A ( 

:)  ,  B(:) 

P( 

,1) 

=  bitshift(P(:,l), 

1)  ; 

K 

,1) 

=  bitshift(I(:,l), 

1)  ; 

H( 

,1) 

=  bitshift(H(:,l), 

1)  ; 

if 

c 

H( 

:  ,  1)  =  bitor (H  (  :  ,1)  , 

Cin(:)) 

K 

:  ,  1)  =  bitor (I  (  :  ,1)  , 

Cin(:)) 

end 

K 

,1) 

=  bitshift(I(:,l), 

1)  ; 

H( 

,2) 

=  H(:  ,1)  ; 

K 

,2) 

=  K:  ,1)  ; 

for  i 

=  1  :  L 

H( 

:  , 1)  =  bitset  (H  ( :  ,  1) 

,  4*i+l, 

1 :  n)  ; 


K 


bitget (H  (  :  ,1)  , 
bitget (H  (  :  ,  1)  , 
bitget (H  (  :  ,  1)  , 
bitget (H  (  :  ,1)  , 
classname  , 
1)  =  bitset  (I  ( 


bitget (I  (  :  ,1)  , 
bitget (I  (  :  ,1)  , 
bitget  (I  (  :  ,1)  , 
bitget  (I  (  :  ,1)  , 
H  (  :  ,  1)  =  bitset (H  ( 
bitget (H  (  :  ,  1)  , 
bitget (H  (  :  ,1)  , 
H (  :  , 1)  =  bitset (H  ( 
bitget  (H  (  :  ,  1)  , 
bitget (H  (  :  ,1)  , 
H (  :  , 1)  =  bitset (H  ( 
bitget  (H  (  :  ,1)  , 
bitget (H  (  :  ,  1)  , 
H (  :  , 1)  =  bitset (H  ( 


4*i  +  l),  A0AD21 1 l_inexact  (  ... 

4*i)  ,  bitget ( I  (:,  1)  ,  4*i  +  l),  . 
4*i-l)  ,  bitget ( I  (:,  1)  ,  4*i),  . 
4*i-2)  ,  p,  classname  ,  1)  ,  p, 

D)  ; 

: ,1) j  4*i+l,  bitand4_inexact (  . 

4*i  +  l)  ,  ... 

4*i)  ,  ... 

4*i-l)  ,  ... 

4*i-2),  p,  classname,  1)); 
:,1),  4*i+l,  A021_inexact(  ... 
4*i+l),  bitget(I(:,l),  4*i+l), 
4*i-3),  p,  classname,  1)); 
:,1),  4*i-2,  A021_inexact(  ... 
4*i-2),  bitget(I(:,l),  4*i-2), 
4*i-3),  p,  classname,  1)); 
:,1),  4*i-l,  A021_inexact(  ... 
4*i-l)  ,  bitget ( I  (:,  1)  ,  4*i-l), 
4*i-2),  p,  classname,  1)); 
:,1),  4*i,  A021_inexact(  ... 
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bitget  (H  (  : 

.1)  . 

4* i )  ,  bitget ( I  ( : 

,1)  , 

4*i)  ,  ... 

bitget  (H  (  : 

.1)  . 

4*i-l) ,  p,  classname , 

,  D); 

end 

SO  (:  ) 

=  mux2_inexact (bitshift (H(:  ,1)  ,1)  ; 

,  P(; 

; , 1) ,  bitxor (P 

(:  ,1)  ,  I(:  ,2))  ,  ... 

p ,  classname ,  1 : (n+2) ) ; 

SO(:)  =  bitshift (SO  (  : )  ,  -1); 

if  isscalar (A) 

SO  =  reshape(S0,  size(B)); 

else 

SO  =  reshape(S0,  size(A)); 

end 

intmaxl  =  zeros ( classname ) ; 
intmaxl(:)  =  pow2(n)  -  1; 

S  =  zer os  (  s ize  ( SO )  ,  classnameO); 
S(:)  =  bitand(S0,  intmaxl); 

Gout  =  f alse ( size ( SO ) )  ; 

Gout  (  :  )  =  bitget (SO  ,  n  +  1); 
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1.4  Brent-Kung  Adder 


1  function  [  S,  Gout,  SO  ]  =  brent_kung_inexact_PBL (  n.  A,  B 

,  Gin ,  p  ) 

2  7obrent_kung_inexact_PBL  :  Adds  inputs  A,  B,  and  Gin, 

simulating  a  Brent-Kung 

3  % adder ,  except  that  each  AND,  XOR ,  and  A021  (and-or)  gate 

has  a  random  error 

4  7o probability  equal  to  1-p. 

5  % 

6  7o Inputs  : 

7  7on :  (positive  integer)  Number  of  bits  processed  by  the 

adder  . 

8  7oA ,  B:  (n-bit  integer  arrays)  Input  arguments  for  the 

adder  . 

9  7o  A,  B,  and  Gin  must  all  have  the  same  dimensions. 

10  7oGin :  (logical  array)  Garry-in  input  for  the  adder. 

11  7oP :  (scalar)  Probability  of  correctness  of  each  AND,  XOR, 

and  A021  gate 

12  7o  inside  the  adder.  0  <=  p  <=  1. 

13  7o 

14  % Outputs  : 

15  7oSO  :  (2*n-bit  integer  array)  Sum  of  A,  B,  and  Gin, 

including  carry-out  bit. 

16  7oS  :  (n-bit  integer  array)  Lower  n  bits  of  SO,  excluding 

carry  -  out  bit  . 

17  7oGout  :  (logical  array)  Garry-out  bit. 

18  % 

19  7oRef  erences  : 

20  7oN.  H.  E.  Weste  and  D.  M.  Harris,  GMOS  VLSI  Design,  4th  ed 

•  9 

21  7oBoston  :  Addison -Wesley  ,  2011,  p.  449. 

22  7o 

23  7oL .  N.  B.  Ghakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

24  7oits  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 

Department  of 

25  7oGomputer  Science,  Jun  2008. 

26 

27  switch  class(A) 

28  case  {  ’  int8  ’  ,  ’uint8']- 

29  na  =  8; 

30  case  {’intl6',  ’uintl6’} 

31  na  =  16; 

32  case  { ’ int32  '  ,  ’uint32’} 

33  na  =  32; 
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case  { ’ int64 ' ,  ’uint64’} 

error  ' 64-bitunotuSupported . ’ 
otherwise 

error  ’  Addends uinust  ubeuof  uthsu integer u classes 


end 

switch  class (B) 
case  { ’ intS ’ 
nb  =  8 ; 

,  ’uint8'> 

case  {’intl6',  ’uintl6’} 
nb  =  16; 

case  { ’ int32 ' ,  ’uint32’} 

nb  =  32; 

case  { ’ int64 ' ,  ’uint64’} 

error  '  64-bituiiotuSupported  .  ’ 
otherwise 

error  ’  Addends  uniust  ubeu of  u the u  integeru  classes 


end 

if  n  <=  7 

classnameO  = 

' uint8 ’ ; 

classname  = 

’ uint8 ' ; 

elseif  n  <=  8 

classnameO  = 

’ uint8 ’ ; 

classname  = 

’ uint 16  ’  ; 

elseif  n  <=  15 

classnameO  = 

’ uint 16’; 

classname  = 

’ uint 16  ’  ; 

elseif  n  <=  16 

classnameO  = 

’ uint 16’; 

classname  = 

’ uint32  ’  ; 

elseif  n  <=  31 

classnameO  = 

’ uint32  ’  ; 

classname  = 

’ uint32  ’  ; 

elseif  n  <=  32 

classnameO  = 

’ uint32  ’  ; 

classname  = 

’ uint64  ’  ; 

else 

classnameO  = 

’ uint64  ’  ; 

classname  = 

’ uint64  ’  ; 

end 

if  nargin  <  5 
p  =  1; 

end 
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if  (nargin  <  4)  | |  isempty ( Cin) 

c  =  0 ; 

Cin  =  zeros ( classname ) ; 

else 

c  =  1; 

CinO  =  logical (Cin)  ; 

Cin  =  zeros ( size ( Cin ), classname ) ; 

Cin  ( :  )  =  CinO  ; 
clear  CinO 

end 

logn  =  2  *  log2 (n) ; 

SO  =  zeros ([max ( [numel (A)  ,  numel  (B)]),  1],  classname); 

P  =  zeros ( [max ( [numel (A)  ,  numel(B)]),  logn],  classname); 
G  =  P; 


.1) 

=  bitxor_inexact  ( A  (  : )  , 

B  ( :  )  ,  p  , 

classnameO  , 

l:n) 

.1) 

=  bitand_inexact  ( A  (  :  )  , 

B  ( :  )  ,  p  , 

classnameO  , 

l:n) 

c 

P(  : 
G(  ; 

,1)  =  bitshift(P(:,l), 
,1)  =  bitor (bitshif t (G  ( 

1)  ; 

: ,1) ,  1) 

,  Cin  (  :  )  )  ; 

end 

i2c  =  zeros ( classname ) ; 

i3c  =  i2c  ; 

for  i  =  2  :  (0.5  *  logn  +  1) 

1 1  =  pow2 (i -2)  ; 

i2c ( : )  =  sum  (pow2 ( ( pow2  ( i ) -1)  : pow2 ( i  - 1 )  : (n  +  c - 1 )  )  )  ; 

i3c  ( :  )  =  sum (pow2 ( ( pow2 ( i - 1 ) -1)  : pow2  ( i  - 1 )  :  (n  +  c - 1 )  ) )  ; 

12  =  bitcmp(i2c,  n+c); 

13  =  bitcmp(i3c,  n+c); 

P(:,i)  =  bit and_ inexact (P (:, i - 1 )  ,  bitshif t (P ( :  , i -1)  ,  il 
) ,  p ,  classname ) ; 

P(:,i)  =  bit or ( bit and (P ( :  , i )  ,  i2c )  ,  bitand (P  ( :  , i-1)  ,  i2) 

)  ; 

G(:,i)  =  A021_inexact (G  ( :  , i-1)  ,  P(:,i-1),  bitshif t (G 
(:,i-l),il),  p,  classname); 

G(:,i)  =  bit or ( bit and ( G  ( :  , i )  ,  i3c )  ,  bitand (G  (:, i-1)  ,  i3) 

)  ; 

end 

for  i  =  (0.5  *  logn  +  2)  :  logn 

il  =  pow2  ( logn - i )  ; 

i3c  ( :  )  =  sum (pow2 ( (3*pow2 (logn-i) -1)  : (pow2 (logn-i  +  1) ) 

:  (n  +  c-1)  ) )  ; 

i3  =  bitcmp(i3c,  n+c); 
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P(:  ,i)  =  P(:  ,i-l)  ; 

G(:,i)  =  A021_inexact (G ( :  , i-1)  ,  P(:,i-1),  bitshift (G 

p,  classname); 

G(:,i)  =  bit or ( bit and ( G  ( :  , i )  ,  i3c )  ,  bitand (G  ( :  , i-1)  ,  i3) 

)  ; 

end 

S0(:)  =  bitxor_inexact(P(:,l),  bitshift(G(:,logn),l),  p, 
classname,  2:(n+c+l)); 
if  c 

S0(:)  =  bitshift (SO  (:)  ,  -1); 

end 

if  isscalar (A) 


SO  = 

else 

reshape  ( SO  , 

size (B) )  ; 

SO  = 

end 

reshape  ( SO  , 

size (A) )  ; 

intmaxl  = 

zeros (classname) ; 

intmax 1 ( : 

)  =  pow2 (n) 

-  1; 

S  =  zeros 

( s ize ( SO )  , 

classnameO ) 

S(:)  =  bitand(S0,  intmaxl); 

Gout  =  f alse ( size ( SO ) )  ; 

Gout  (  :  )  =  bitget (SO  ,  n  +  1); 
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1.5 


Sklansky  Adder 


1  function  [  S,  Gout,  SO  ]  =  sklansky_inexact_PBL (  n,  A,  B, 

Gin  ,  p  ) 

2  7oSklansky_inexact_PBL  :  Adds  inputs  A,  B,  and  Gin, 

simulating  a  Sklansky 

3  % adder ,  except  that  each  AND,  XOR ,  and  A021  (and-or)  gate 

has  a  random 

4  %error  probability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  "/oU :  (positive  integer)  Number  of  bits  processed  by  the 

adder  . 

8  7oA ,  B:  (n-bit  integer  arrays)  Input  arguments  for  the 

adder  . 

9  7o  A,  B,  and  Gin  must  all  have  the  same  dimensions. 

10  7oGin :  (logical  array)  Garry-in  input  for  the  adder. 

11  7oP :  (scalar)  Probability  of  correctness  of  each  AND,  XOR, 

and  A021  gate 

12  7o  inside  the  adder.  0  <=  p  <=  1. 

13  7o 

14  7o Outputs  : 

15  7oSO  :  (2*n-bit  integer  array)  Sum  of  A,  B,  and  Gin, 

including  carry-out  bit. 

16  7oS  :  (n-bit  integer  array)  Lower  n  bits  of  SO,  excluding 

carry  -  out  bit  . 

17  7oGout  :  (logical  array)  Garry-out  bit. 

18  % 

19  7oRef  erences  : 

20  7oN.  H.  E.  Weste  and  D.  M.  Harris,  GMOS  VLSI  Design,  4th  ed 

•  9 

21  7oBoston  :  Addison -Wesley  ,  2011,  p.  449. 

22  7o 

23  7oL .  N.  B.  Ghakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

24  7oits  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 

Department  of 

25  7oGomputer  Science,  Jun  2008. 

26 

27  if  n  <=  7 

28  classnameO 

29  classname 

30  elseif  n  <=  8 

31  classnameO 

32  classname 

33  elseif  n  <=  15 


=  ’ uint8  ’  ; 
=  ’ uint8  ’  ; 

=  ' uint8  ’  ; 
=  ’ uint 16  ’  ; 
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classnameO 

classname 


’ uint 16  ’  ; 
uint 16  ’  ; 


elseif  n  <=  16 

classnameO  =  'uintl6’; 
classname  =  ’uint32’; 

elseif  n  <=  31 

classnameO  =  'uint32’; 

classname  =  ’uint32’; 

elseif  n  <=  32 

classnameO  =  ’uint32’; 

classname  =  ’uint64’; 

else 

classnameO  =  ’uint64’; 

classname  =  ’uint64’; 

end 

if  (~ exist (’ p var ’) )  II  isempty(p) 

P  =  1; 

end 

if  (~ exist (’ Gin var ’) )  II  isempty(Cin) 
c  =  0 ; 

Gin  =  zeros ( classname ) ; 

else 

c  =  1; 

GinO  =  logical (Gin) ; 

Gin  =  zer os ( s ize ( Gin ), classname ) ; 

Gin  (  :  )  =  GinO  ; 

end 

logn  =  log2 (n)  +  c  +  1; 

SO  =  zeros([max( [numel (A)  ,  numel (B)]),  1],  classname); 

P  =  zeros ( [max ( [numel (A)  ,  numel(B)]),  logn],  classname); 
G  =  P; 

bitsO  =  1  :  pow2(logn  -  1)  ; 

P(:,l)  =  bitxor_inexact(A(:),  B(:),  p,  classnameO,  l:n); 
G(:,l)  =  bitand_inexact(A(:),  B(:),  p,  classnameO,  l:n); 
if  c 

P(:,l)  =  bitshift (P  (  :  ,  1)  ,  1); 

G(:,l)  =  bitor (bitshift (G  (:,  1)  ,  1),  Gin(:)); 

end 

for  i  =  2  :  logn 

il  =  pow2 (i -2)  ; 

bitsO  =  reshape (bitsO  ,  [2*il,  numel (bitsO ) / (2*  il )  ]  )  ; 
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bitsl  =  bitset ( zeros ( classname )  ,  bitsO  ( 1 : il  ,  : )  , 
classname ) ; 

bitsl  =  sum (bitsl native ’) ; 
bitslc  =  bitcmpO (bitsl ,  n+c); 
bitslc_  =  bitsO ( ( il  +  1)  : end  ,  :  )  ; 

bits2  =  bitset ( zeros ( classname ) ,  il  :  (2*il)  :  (2*pow2 

( logn - i ) * il - i 1 )  ,  classname); 
bits2  =  sum (bits2 (:),’ native ’) ; 

PI  =  bitand  (P  ( :  ,  i  “D  )  bitsl); 

G1  =  bitand  (G  ( :  ,  i  “D  )  bitsl); 

Pic  =  bitand (P  ( :  , i~l)  >  bitslc); 

Glc  =  bitand (G  (:, i~l)  .  bitslc); 

P2_  =  bitand (P  (:, i“l)  >  bits2); 

G2_  =  bitand (G  (:, i“l)  >  bits2); 

P2  =  P2_; 

G2  =  G2_; 

for  j  =  1  :  (il-1) 

P2  =  P2_  +  bitshift (P2 , 1) ; 

G2  =  G2_  +  bitshif t  (G2  ,  1)  ; 

end 

P2  =  bitshift (P2  ,  1)  ; 

G2  =  bitshif t (G2 , 1) ; 

P(:,i)  =  bitor(Pl,  bitand_inexact (Pic ,  P2 ,  p, 
classname,  bitslc_)); 

G(:,i)  =  bitor(Gl,  A021_inexact (Glc ,  Pic,  G2 ,  p, 
classname,  bitslc_)); 

end 

S0(:)  =  bitxor_inexact(P(:,l),  bitshift(G(:,logn),l),  p, 
classname,  2:(n+c+l)); 
if  c 

S0(:)  =  bitshift (SO  (:)  ,  -1); 

end 

if  isscalar (A) 

SO  =  reshape(S0,  size(B)); 

else 

SO  =  reshape  (SO  ,  size(A)); 

end 

intmaxl  =  zeros ( classname ) ; 
intmaxl(:)  =  intmax ( classnameO ) ; 

S  =  zer os  (  s ize  ( SO )  ,  classnameO); 

S(:)  =  bitand(S0,  intmaxl); 

Gout  =  f alse ( size ( SO ) )  ; 
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121  Gout  ( :  )  =  bitget(S0,  n  +  1)  ; 
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1.6  Adder  Front-End 


The  following  is  a  front-end  for  all  of  the  inexact  integer  adders. 

1  function  [  S,  Gout,  SO  ]  =  adder _ inexact _PBL (  arch,  n,  A, 

B,  Gin,  p  ) 

2  %  adder _ inexact _PBL :  Adds  inputs  A,  B,  and  Gin,  simulating 

an 

3  % adder ,  except  that  each  AND,  XOR ,  and  A021  (and-or)  gate 

has  a  random 

4  "/oerror  probability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  "/oarch:  (string)  Inexact  adder  architecture:  ’  RG  ’  (ripple 

-  carry )  , 

8  %  ’Ling’  (valency-4  carry  lookahead),  ’ Sklansky ’ ,  ’ 

Brent -Rung’,  or 

9  %  ’ Kogge -Stone  ’  . 

10  "/on:  (positive  integer)  Number  of  bits  processed  by  the 

adder  . 

11  7oA ,  B:  (n-bit  integer  arrays)  Input  arguments  for  the 

adder . 

12  %  A,  B,  and  Gin  must  all  have  the  same  dimensions. 

13  "/oGin :  (logical  array)  Garry-in  input  for  the  adder. 

14  7op :  (scalar)  Probability  of  correctness  of  each  AND,  OR, 

XOR,  A021 ,  and/or 

15  7o  A0A02111  gate  inside  the  adder.  0  <=  p  <=  1. 

16  7o 

17  7o Outputs  : 

18  7oSO  :  (2*n-bit  integer  array)  Sum  of  A,  B,  and  Gin, 

including  carry-out  bit. 

19  7oS  :  (n-bit  integer  array)  Lower  n  bits  of  SO,  excluding 

carry  -  out  bit  . 

20  7oGout  :  (logical  array)  Garry-out  bit. 

21  % 

22  7oRef  erences  : 

23  7oN.  H.  E.  Weste  and  D.  M.  Harris,  GMOS  VLSI  Design,  4th  ed 

•  y 

24  7oBoston  :  Addison -Wesley  ,  2011,  p.  449. 

25  7o 

26  7oL .  N.  B.  Ghakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

27  7oits  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 

Department  of 

28  7oGomputer  Science,  Jun  2008. 

29 
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if  (~exist(’arch’,’var'))  II  i s empty ( ar ch ) 
arch  =  ' RC  '  ; 

end 

if  ~ exist (’ Cin var  ’  ) 

Cin  =  []  ; 

end 

if  (~exist(’p’,’varO)  II  isempty(p) 

P  =  1; 

end 


curdir  =  pwd ; 
dirname  =  userpath; 

if  (dirname(end)  ==  II  (dirname(end)  ==  '\’) 

dirname  =  dirname ( 1 : end - 1 )  ; 

end 


switch  upper(arch) 

case  {’LING’  ,  ’LINGuCLA’  ,  ’LING-CLA’  ,  ’ CLA ’  ,  ’CARRYu 
LOOKAHEAD ’  ,  ’ CARRY  -  LOOKAHEAD ’  ,  ’ CARRYuLOOKuAHEAD ’ } 
dirname  =  [dirname,  ’ \Ling -CLA ’ ] ; 
fname  =  ’ ling_adder_inexact_PBL ’ ; 

case  ’SKLANSKY’ 


dirname  =  [dirname,  ’ \ Sklansky  ’  ]  ; 
fname  =  ’ sklansky_inexact_PBL  ’  ; 

case  { ’BREMT-KUNG ’ , ’BRENTuKUNG ’} 

dirname  =  [dirname,  ’ \Brent -Kung  ’  ]  ; 
fname  =  ’ brent_kung_inexact_PBL ’ ; 

case  { ’ KOGGE- STONE ’ , ’ KOGGEu STONE ’ } 

dirname  =  [dirname,  ’ \Kogge -Stone ’]  ; 
fname  =  ’ kogge_stone_inexact_PBL ’ ; 

otherwise  %  use  inexact  ripple-carry 

dirname  =  [dirname,  ’ \Ripple -Carry ’ ] 
fname  =  ’ Adder_RCA_inexact ’  : 


architecture 


end 


cd ( dirname ) 


try 

[S ,  Cout ,  SO]  =  feval(fname,  n.  A,  B,  Cin,  p) ; 
catch  exception 
cd ( curdir ) 
rethrow (exception) 

end 
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75  cd(curdir) 
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1.7  Adder-Subtractor 


1  function  [S ,  Gout,  SO]  =  adder_subtractor_inexact_PBL ( arch 

,  n ,  A ,  B ,  D ,  p) 

2  yoadder_subtractor_inexact_PBL  adds  or  subtracts  inputs  A, 

B, 

3  "/odepending  on  the  value  of  D,  simulating  an  inexact 

digital 

4  %adder - subtractor .  Each  AND,  XOR ,  NOT,  and  A021  (and-or) 

5  "/ogate  within  the  circuit  has  a  random  error  probability 

6  % equal  to  1-p. 

7  7, 

8  %  Inputs : 

9  yarch:  (string)  Inexact  adder  architecture:  ’ RC ’  (ripple 

10  7,  carry),  'Ling’  (valency -4  carry  lookahead),  ’  Sklansky 

> 

11  7o  ’  Brent -Rung  ’  ,  or  ’  Kogge -Stone  ’  . 

12  7on :  (positive  integer)  Number  of  bits  processed  by  the 

adder 

13  7oA ,  B:  (n-bit  integer  arrays)  Input  arguments  for  the 

adder . 

14  %  A,  B,  and  D  must  all  have  the  same  dimensions. 

15  7oD :  (logical  array)  Determines  whether  the  device 

performs 

16  7o  addition  or  subtraction.  If  D  is  false,  then  add. 

If 

17  7o  D  is  true  ,  then  subtract  . 

18  7oP :  (scalar)  Probability  of  correctness  of  each  AND,  OR, 

19  7o  NOT,  XOR,  A021  ,  and/or  A0A02111  gate  inside  the  adder 

20  %  0  <=  p  <=  1. 

21  7o 

22  7o Outputs  : 

23  %S0 :  (2*n-bit  integer  array)  Sum  A+B  or  difference  A-B, 

24  %  including  the  carry-out  bit. 

25  7oS  :  (n-bit  integer  array)  Lower  n  bits  of  SO,  excluding 

the 

26  7o  carry-out  bit. 

27  7oCout  :  (logical  array)  Carry-out  bit.  For  addition.  Gout 

28  7o  is  true  in  case  of  a  carry.  For  subtraction.  Gout  is 

29  7o  false  in  case  of  a  borrow. 

30  7o 

31  "/References: 

32  %N.  H.  E.  Weste  and  D.  M.  Harris,  CMOS  VLSI  Design,  4th  ed 
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33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 


%Boston :  Addison -Wesley ,  2011,  p.  449. 

I 

"/oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 
^Boolean  logic  and  its  meaning,"  Tech.  Rep.  TR-08-05 ,  Rice 
7oUniversity  ,  Department  of  Computer  Science,  Jun  2008. 

if  (~exist(’arch’,’varO)  II  i  s  empty  (  ar  ch ) 
arch  =  ’ RC  '  ; 

end 

if  ~exist(’D’,'var’)  ||  is empty (D) 

D  =  false ; 

end 

if  (~exist('p’,’varO)  II  isempty(p) 

P  =  1; 

end 

OxFFFF  =  bit cmpO ( zeros (' like B ), n) ; 

B_  =  bit cmp_ inexact (B , n , p) ; 

D1  =  cast (D like  ’  ,B)  *  DxFFFF ; 

B1  =  mux2_inexact (D1 , B , B_  ,  p)  ; 

[S ,  Gout,  SO]  =  adder _ inexact _PBL ( arch ,  n.  A,  B1 ,  D,  p) ; 
Cout_  =  xor_ inexact ( Gout , D , p ) ; 

SO  =  bit  set  (  cast  (S  ,’ like ’,  SO ),  n  +  1  ,  Gout  _)  ; 
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Appendix  B.  Inexact  Floating-Point  Adder 


function  [  Ss ,  Es , 
Ma  ,  Sb  ,  Eb  ,  Mb  , 


Ms  ]  =  Adder_f loating_inexact (  Sa ,  Ea , 

fmt ,  p  ) 


switch  upper(fmt) 

case  'BINARY16' 
ne  =  5 ; 
nm  =  10; 
case  'BINARY32’ 
ne  =  8 ; 
nm  =  23; 
case  'BINARY64’ 
ne  =  11; 
nm  =  52 ; 

case  'BINARY128’ 


ne  =  15 ; 
nm  =  112; 
otherwise 

error  '  fmtuniustubeubinaryl6 
orubinaryl28 . ’ 

end 


ubinary32 


ubinary64 , u 


if  ~  exist  (’ p var  O 
P  =  1; 

end 


[Sal  ,  Eal  , Mai , ~ , Ebl , Mbl , D]  =  big_comparator (Sa , Ea , Ma  ,  Sb  ,  Eb  , 
Mb  ,  ne  ,  nm  ,  p)  ; 

Ediff  =  adder_subtractor_inexact_PBL (’ RC ne , Eal , Ebl , true , 
p)  ;  "/oabsCEa-Eb) 

[Ma2,nm4]  =  prepend_mantissa (Eal , Mai , ne , nm , p) ; 

Mb2  =  pr epend_mant i ssa (Ebl , Mbl , ne , nm , p) ; 

Mb3  =  rightshif t_mantissa (Mb2 , Ediff  , nm4 , ne , p)  ; 

QxFF  =  cast  (pow2a (ne ,  ’ uint64 ’ )  -  1,  ’like’,  Ea)  ; 

OxYFFFFF  =  cast (pow2a (nm4 , ’ uint64 ’ )  -  1,  ’like’,  Ma2); 

Ss  =  Sal ; 

[~,Mc,Msl]  =  adder_subtractor_inexact_PBL  (  ’  RC  ’  , nm4 , Ma2 , Mb3 
,  D  ,  p)  ;  7,  add 

Mcl  =  and_inexact (Me , ~D , p) ; 
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39  Ea2  =  adder _ inexact _PBL (’ RC ne  ,  Eal  , zeros (’ like Eal ),  Mcl  , 

p)  ;  "/ocarry  to  exponent 

40  [~,~,Ms2]  =  bitshif ter_inexact_PBL (Msl , - intS (Mcl ) , nm4+l , 1 , 

p)  ;  "/oif  carry-out,  then  shift  right 

41 

42  Eborrow_  =  cast  (bitcounter  (Ms2  ,  nni4  ,  p)  ,  'like',  Ea)  ; 

43  Eborrow  =  bit cmp ( Eborr ow_ ) ; 

44  gd_  =  comparator_inexact_PBL  (ne  ,  Ea2  ,  Eborrow  ,  p)  ;  °k 

graceful  degradation  (GD)  flag 

45  gd__  =  cast  (gd_  ,' like ',  Ea)  *  OxFF  ; 

46  gd _ =  cast (gd_ ,' like  '  ,Ma2)  *  0x7FFFFF  ; 

47  Ea3  =  adder _ inexact  _PBL  (' RC ',  ne  ,  Ea2  ,  Eborrow_  ,  true  ,  p )  ;  % 

borrow  from  exponent 

48  Ms3  =  lef  tshif  t_mantissa  (Ms2  ,  nm4  ,  p)  ;  "/,  shift  left 

after  subtraction 

49 

50  Ms3g  =  bitshif ter_inexact_PBL (Ms2 , Ea2 , nm4 , ne , p) ;  % 

mantissa  in  case  of  GD 

51  Ms4  =  mux2_ inexact  ( gd _ ,  Ms3g  ,  Ms3  ,  p  ,  class  ( Ms3 ),  1 :  nm4 )  ; 

52 

53  R  =  roundoff (Ms4 , p)  ; 

54  MsB  =  bitshif  t  (bitset  (Ms4  ,  nm4 , 0)  , -3)  ;  %  drop  extra 

bits  from  mantissa 

55  [Ms,Mc2]  =  adder_inexact_PBL (' RC ', nm , Ms5 , zeros (' like ', Ms5 ) 

,R,p);  %round  off 

56 

57  Ea4  =  adder _ inexact  _PBL  (' RC ',  ne  ,  Ea3  ,  zeros  (' like ',  Ea3 ),  Mc2  , 

p)  : 

58  Es  =  mux2_inexact (gd__ , zeros (' like ',  Ea4)  , Ea4 ,p , class (Ea4) 

,l:ne);  %  with  GD ,  exponent=0 

59 

60  end 

61 

62  function  [  Sal,  Eal ,  Mai,  Sbl ,  Ebl ,  Mbl ,  D  ]  = 

big_comparator  (  Sa ,  Ea ,  Ma ,  Sb ,  Eb ,  Mb,  ne ,  nm ,  p  ) 

63  %Compares  the  magnitude  of  input  A  with  the  magnitude  of 

input  B .  If  A>B  , 

64  "/othen  output  A1=A  and  output  B1=B.  If  A<  =  B,  then  A1=B  and 

B1=A  . 

65 

66  OxFF  =  cast (pow2a (ne ,  ' uint64 ' )  -  1,  'like',  Ea)  ; 

67  0x7FFFFF  =  cast (pow2a (nm , ' uint64 ' )  -  1,  'like',  Ma) ; 

68  n  =  ne  +  nm ; 

69  classname  =  classn(n); 

70  Ea_  =  cast (Ea , classname ) ; 

71  Ma_  =  cast (Ma , classname ) ; 
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%  merge 


72  Eb_  =  cast (Eb , classname )  ; 

73  Mb_  =  cast (Mb , classname ) ; 

74  AA  =  bitshif t (Ea_ , nm)  +  Ma_ ; 

mantissa  with  exponent 

75  BB  =  bitshif t (Eb_ , nm)  +  Mb_ ; 

76  [AgtB , ~ , AeqB]  =  comparator_inexact_PBL (n , AA , BB  ,  p)  ; 

°/o  compare 

77  AgtB_  =  cast  (AgtB  like Ea)  *  OxFF  ; 

78  AgtB__  =  cast (AgtB like  ’  ,Ma)  *  0x7FFFFF  ; 

79 

80  D  =  xor_ inexact ( Sa , Sb , p )  ; 

81  Z  =  and_ inexact (D , AeqB , p ) ; 

82  Z_  =  cast (Z like ’  ,Ea)  *  OxFF; 

83  Z__  =  cast  (Z  like  ’  ,Ma)  *  0x7FFFFF  ; 

84  Saz  =  mux2_inexact(Z,Sa, false, p,class(Sa),l); 

85  Eaz  =  mux2_inexact  (Z_  ,  Ea  ,  zeros  (' like Ea)  ,p  ,  class  (Ea)  ,  1 :  ne 

)  ;  7o  if  B  ==  -A ,  then  output  0 

86  Maz  =  mux2_inexact  (Z__  ,  Ma  ,  zeros  (’ like Ma)  ,p  ,  class  (Ma)  ,  1 : 

nm)  ; 

87  Sbz  =  mux2_inexact  (Z  ,  Sb  ,  false  ,p  ,  class  (Sb)  ,  1)  ; 

88  Ebz  =  mux2_inexact  (Z_  ,  Eb  ,  zeros  (' like Eb)  ,p  ,  class  (Eb)  ,  1 :  ne 

)  ;  %  if  B  ==  -A ,  then  output  0 

89  Mbz  =  mux2_inexact  ( Z__  ,  Mb  ,  zeros  (’ like Mb ), p  ,  class  ( Mb ),  1 : 

nm)  ; 

90 

91  [Sal,Sbl]  =  mux2_ inexact ( AgtB , Sbz , Saz , p , class ( Sa)  ,  1)  ; 

"/o  assign  the  greater  value  to  A1 

92  [Eal,Ebl]  =  mux2_ inexact (AgtB_ ,Ebz ,Eaz ,p, class (Ea)  ,l:ne)  ; 

7o  and  the  lesser  value  to  B1 

93  [Mal,Mbl]  =  mux2_inexact (AgtB__ , Mbz , Maz , p , class  (Ma)  ,  1 : nm)  ; 

94 

95  end 

96 

97  function  [  Ma2 ,  nm4  ]  =  prepend_mantissa (  Ea ,  Mai,  ne ,  nm , 

P  ) 

98  7oFor  all  nonzero  Ea ,  a  1  is  prepended  to  the  mantissa  Ma . 

Then  three  zeros 

99  7oare  added  to  the  end. 

100 

101  H  =  any_high_bits_inexact_PBL  (Ea  ,  ne  ,  p)  ;  7o 

check  for  0 

102  nm4  =  nm  +  4 ; 

103  classname  =  classn(nm4);  7o  allow  space  for 

four  more  bits 

104  Mai  =  cast (Mai , classname ) ; 
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105 
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123 
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146 


“/o  prepend  1  and 


]yia2  =  bitshif  t  (bitset  (Mai  ,  nm  +  1  ,  H)  ,  3)  ; 
shift  3  to  the  left 


end 

function  Mb3  =  rightshif  t_niantissa  (  Mb2  ,  Ediff  ,  nm4  ,  ne  ,  p 

) 

classname  =  class ( Ediff ) ; 

if  intmin ( classname )  >=  0 

classname  =  classn (ne+1 , true ) ; 

Ediff  =  cast (Ediff , classname ) ; 

end 

[~,~>Mb3]  =  bitshif ter_inexact_PBL (Mb2 , -Ediff , nm4 , ne , p) ; 
end 

function  Ms3  =  lef tshif t_mantissa (  Ms2 ,  nm4 ,  p  ) 
%Left-shift  the  mantissa  until  the  most  significant  bit  is 
1 . 

DxFFFF  =  cast  (pow2a  (nm4  ,  ’  uint64  ’  )  -  1,  'liken  Ms2); 

Ms3  =  Ms2; 

for  i  =  1  :  (nm4-l) 

Ms3a  =  bitshif t (Ms3 , 1)  ; 

Ms3a  =  bitand (Ms3a , OxFFFF ) ; 
s  =  bitget (Ms3 , nm4 )  *  OxFFFF; 

Ms3  =  mux2_ inexact  ( s  ,  Ms3a  ,  Ms3  ,  p  ,  class  ( Ms2 ),  1 :  iim4 )  ; 

end 

end 

function  N  =  bitcounter(  Ms2 ,  nm4 ,  p  ) 

s  =  logical(bitget(repmat(Ms2(:),[l>nm4]),  repmat(l: nm4 ,  [ 
numel (Ms2)  ,1]))); 

if  nm4  ==  14 

N  =  bit  set (N , 1 ) and3_inexact ( [ or3_inexact  (  [~ s  (  :  ,  1 )  > s  ( : 

:  ,2)  ,s(:  ,4)  ,s(:  ,6)  ,s(:  ,8)  ,s(:  ,10)  ,s(:  ,12)  ,s(:  ,14)]  ,p)  ,or3 
3_inexact([~s(:  ,3)  ,s(:  ,4)  ,s(:  ,6)  ,s(:  ,8)  ,s(:  ,10)  ,s(:  ,12)  ,s 
s(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,5)  ,s(:  ,6)  ,s(:  ,8)  ,s(:  ,10)  ,s( 
(:  ,12)  ,s(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,7)  ,s(:  ,8)  ,s(:  ,10)  ,s( 
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165 
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167 

168 

169 
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171 
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173 

174 
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(:  ,12)  ,s(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,9)  ,s(:  ,10)  ,s(:  ,12)  ,s 
s(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,11)  ,s(:  ,12)  ,s(:  ,14)]  ,p)  ,or3 
3_inexact([~s(:  ,13)  ,s(:  ,14)]  ,p)]  ,p))  ; 

N  =  bitset (N , 2 , and3_inexact ( [or3_inexact  (  [s  ( :  ,  1)  ,  s  ( :  , 
,2)  ,s(:  ,5)  ,s(:  ,6)  ,s(:  ,9)  ,s(:  ,10)  ,s(:  ,13)  ,s(:  ,14)]  ,p)  ,or3_ 
_inexact([~s(:  ,3)  ,s(:  ,5)  ,s(:  ,6)  ,s(:  ,9)  ,s(:  ,10)  ,s(:  ,13)  ,s( 
(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,4)  ,s(:  ,5)  ,s(:  ,6)  ,s(:  ,9)  ,s(:  , 
,10)  ,s(:  ,13)  ,s(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,7)  ,s(:  ,9)  ,s(:  , 
,10)  ,s(:  ,13)  ,s(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,8)  ,s(:  ,9)  ,s(:  , 
,10)  ,s(:  ,13)  ,s(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,11)  ,s(:  ,13)  ,s( 
(:  ,14)]  ,p)  ,or3_inexact([~s(:  ,12)  ,s(:  ,13)  ,s(:  ,14)]  ,p)]  ,p))) 
> 

N  =  bit  set (N , 3 , and3_inexact ([ or3_ inexact  (  [~ s  (:,  10)  ,  s  ( 
(:  ,11)  ,s(:  ,12)  ,s(:  ,13)  ,s(:  ,14)]  ,p)  ,or3_inexact([s(:  ,3)  ,s( 
(:  ,4)  ,s(:  ,5)  ,s(:  ,6)  ,s(:  ,11)  ,s(:  ,12)  ,s(:  ,13)  ,s(:  ,14)]  ,p)  ,o 
or3_inexact([~s(:  ,7)  ,s(:  ,11)  ,s(:  ,12)  ,s(:  ,13)  ,s(:  ,14)]  ,p)  , 
,or3_inexact([~s(:  ,8)  ,s(:  ,11)  ,s(:  ,12)  ,s(:  ,13)  ,s(:  ,14)]  ,p) 

)  ,or3_inexact([~s(:  ,9)  ,s(:  ,11)  ,s(:  ,12)  ,s(:  ,13)  ,s(:  ,14)]  ,p 
p)] ,p) )  ; 

N  =  bitset(N,4,or3_inexact([s(:  ,7)  ,s(:  ,8)  ,s(:  ,9)  ,s(:  , 
,10)  ,s(:  ,11)  ,s(:  ,12)  ,s(:  ,13)  ,s(:  ,14)]  ,p))  ; 

elseif  nm4  ==  27 

N  =  repmat(intmax(’uint8’),[numel(Ms2),l]); 

N  =  bitset (N , 1 , and3_inexact ( [or3_inexact  (  [s  (:,  1)  ,  s  (:  , 
,3),s(:,5),s(:,7),s(:,9),s(:,ll),s(:,13),s(:,15),s(:,17), 
,s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inexact( 
([~s(:  ,10)  ,s(:  ,11)  ,s(:  ,13)  ,s(:  ,15)  ,s(:  ,17)  ,s(:  ,19)  ,s(:  ,21 
1)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inexact  ([~s(:  ,2)  ,s(:  ,3 
3)  ,s(:  ,5)  ,s(:  ,7)  ,s(:  ,9)  ,s(:  ,11)  ,s(:  ,13)  ,s(:  ,15)  ,s(:  ,17)  ,s 
s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inexact([ 
[~s(:,4),s(:,5),s(:,7),s(:,9),s(:,ll),s(:,13),s(:,15),s(: 

:  ,17)  ,s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_ine 
exact([~s(:  ,6)  ,s(:  ,7)  ,s(:  ,9)  ,s(:  ,11)  ,s(:  ,13)  ,s(:  ,15)  ,s(:  , 
,17)  ,s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inex 
xact([~s(:  ,8)  ,s(:  ,9)  ,s(:  ,11)  ,s(:  ,13)  ,s(:  ,15)  ,s(:  ,17)  ,s(:  , 
,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s( 
(:  ,12)  ,s(:  ,13)  ,s(:  ,15)  ,s(:  ,17)  ,s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s( 
(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,14)  ,s(:  ,15)  ,s(:  ,17)  , 
,s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inexact( 
([~s(:  ,16)  ,s(:  ,17)  ,s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  ,25)  ,s(:  ,27 
7)]  ,p)  ,or3_inexact([~s(:  ,18)  ,s(:  ,19)  ,s(:  ,21)  ,s(:  ,23)  ,s(:  , 
,25)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,20)  ,s(:  ,21)  ,s(:  ,23)  ,s( 
(:  ,25)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,22)  ,s(:  ,23)  ,s(:  ,25)  , 
,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,24)  ,s(:  ,25)  ,s(:  ,27)]  ,p)  ,or 
r3_inexact([~s(:  ,26)  ,s(:  ,27)]  ,p)]  ,p))  ; 
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N  =  bitset (N , 2 , and3_inexact ( [or3_inexact  (  [s  ( :  ,  3)  ,  s  ( :  , 
,6)  ,s(:  ,7)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,14)  ,s(:  ,15)  ,s(:  ,18)  ,s(:  ,19 
9)  ,s(:  ,2)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact 
t([~s(:  ,4)  ,s(:  ,6)  ,s(:  ,7)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,14)  ,s(:  ,15)  , 

,s(:  ,18)  ,s(:  ,19)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_ 

_inexact([~s(:  ,5)  ,s(:  ,6)  ,s(:  ,7)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,14)  ,s 
s(:  ,15)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  , 
,p)  ,or3_inexact([~s(:  ,8)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,14)  ,s(:  ,15)  , 

,s(:  ,18)  ,s(:  ,19)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_ 

_inexact([~s(:  ,9)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,14)  ,s(:  ,15)  ,s(:  ,18) 
)  ,s(:  ,19)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact 
t([-s(:,12),s(:,14),s(:,15),s(:,18),s(:,19),s(:,22),s(:,2 
23)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,13)  ,s(:  ,14)  ,s(: 

:  ,15)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p) 
)  ,or3_inexact([~s(:  ,16)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,22)  ,s(:  ,23)  ,s 
s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,17)  ,s(:  ,18)  ,s(:  ,19) 
)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,2 
20)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p),or3_inexact([~s(: 

:  ,21)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s 
s(:  ,24)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,25)  ,s(:  ,26) 
)  ,  s  (  :  ,  27)  ]  ,  p)  ]  ,  p) )  ; 

N  =  bit  set (N , 3 , and3_inexact ( [ or3_ inexact  (  [~ s  (  :  ,  4)  ,  s  ( : 

:  ,8)  ,s(:  ,9)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,16)  ,s(:  ,17)  ,s(:  ,18)  ,s(:  ,1 
19)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p),or3_inexact([~s(: 

:  ,5)  ,s(:  ,8)  ,s(:  ,9)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,16)  ,s(:  ,17)  ,s(:  ,18 

8)  ,s(:  ,19)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexac 
ct([~s(:  ,6)  ,s(:  ,8)  ,s(:  ,9)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,16)  ,s(:  ,17) 
)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3 
3_inexact([~s(:  ,7)  ,s(:  ,8)  ,s(:  ,9)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,16)  , 
,s(:  ,17)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)] 
]  ,p)  ,or3_inexact([~s(:  ,12)  ,s(:  ,16)  ,s(:  ,17)  ,s(:  ,18)  ,s(:  ,19 

9)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  , 
,13)  ,s(:  ,16)  ,s(:  ,17)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  , 
,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,14)  ,s(:  ,16)  ,s(:  ,17)  ,s( 
(:  ,18)  ,s(:  ,19)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,  or3_in 
nexact  ([~s(:  ,15)  ,s(:  ,16)  ,s(:  ,17)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,24)  , 
,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,20)  ,s(:  ,24 
4)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p),or3_inexact([~s(:  ,21)  ,s(:  , 
,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,22)  ,s( 
(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,23)  , 
,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)]  ,p))  ; 

N  =  bit  set (N , 4 , and3_inexact ( [ or3_inexact  (  [s  ( :  ,  4)  ,  s  ( :  , 
,5)  ,s(:  ,6)  ,s(:  ,7)  ,s(:  ,8)  ,s(:  ,9)  ,s(:  ,10)  ,s(:  ,11)  ,s(:  ,20)  ,s 
s(:  ,21)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  , 
,p)  ,or3_inexact([~s(:  ,12)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22)  ,s(:  ,23) 
)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,1 
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13)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,2 
26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,14)  ,s(:  ,20)  ,s(:  ,21)  ,s(: 

:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_ine 
exact([~s(:  ,15)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s 
s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,16)  ,s(:  ,20) 
)  ,s(:  ,21)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27) 
)]  ,p)  ,or3_inexact([~s(:  ,17)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22)  ,s(:  ,2 
23)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p),or3_inexact([~s(: 

:  ,18)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,25)  ,s(: 

:  ,26)  ,s(:  ,27)]  ,p)  ,or3_inexact([~s(:  ,19)  ,s(:  ,20)  ,s(:  ,21)  ,s 
s(:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p)]  ,p))  ; 

N  =  bitset(N,5,or3_inexact([s(:,12),s(:,13),s(:,14),s 
s(:  ,15)  ,s(:  ,16)  ,s(:  ,17)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,20)  ,s(:  ,21)  ,s 
s(:  ,22)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,25)  ,s(:  ,26)  ,s(:  ,27)]  ,p))  ; 


elseif  nni4  ==  56 

N  =  repmat(intmax(’uintl6’),  [numel (Ms2)  ,1])  ; 
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:  ,36)  ,s(:  ,38)  ,s(:  ,40)  ,s(:  ,42)  ,s(:  ,44)  ,s(:  ,46)  ,s(:  ,48)  ,s(: 

:  ,50)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,33)  ,s 

s  (  :  ,34)  ,s  (:  ,36)  ,s  (  :  ,38)  ,s  (  :  ,40)  ,s  (  :  ,42)  ,s  (:  ,44)  ,s  (  :  ,46)  ,s 

s(:  ,48)  ,s(:  ,50)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3_inexact([ 

[~s(:  ,35)  ,s(:  ,36)  ,s(:  ,38)  ,s(:  ,40)  ,s(:  ,42)  ,s(:  ,44)  ,s(:  ,46) 

)  ,s(:  ,48)  ,s(:  ,50)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3_inexact 

t([~s(:  ,37)  ,s(:  ,38)  ,s(:  ,40)  ,s(:  ,42)  ,s(:  ,44)  ,s(:  ,46)  ,s(:  ,4 
48)  ,s(:  ,50)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p),or3_inexact([~s(: 

:  ,39)  ,s(:  ,40)  ,s(:  ,42)  ,s(:  ,44)  ,s(:  ,46)  ,s(:  ,48)  ,s(:  ,50)  ,s(: 

:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,41)  ,s(:  ,42)  ,s 

s  (  :  ,44)  ,s  (:  ,46)  ,s  (  :  ,48)  ,s  (:  ,50)  ,s  (  :  ,52)  ,s  (:  ,54)  ,s  (  :  ,56)]  , 

,p)  ,or3_inexact([~s(:  ,43)  ,s(:  ,44)  ,s(:  ,46)  ,s(:  ,48)  ,s(:  ,50) 
)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,45)  ,s(:  ,4 
46)  ,s(:  ,48)  ,s(:  ,50)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3_inexa 
act([~s(:  ,47)  ,s(:  ,48)  ,s(:  ,50)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p) 
)  ,or3_inexact([~s(:  ,49)  ,s(:  ,50)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  , 
,p)  ,or3_inexact([~s(:  ,51)  ,s(:  ,52)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3 
3_inexact([~s(:  ,53)  ,s(:  ,54)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(: 

:  ,55)  ,s(:  ,56)]  ,p)]  ,p))  ; 

N  =  bit  set (N , 2 , and3_inexact ([ or3_ inexact  (  [~ s  (:,  1 ),  s  ( : 

:  ,3)  ,s(:  ,4)  ,s(:  ,7)  ,s(:  ,8)  ,s(:  ,11)  ,s(:  ,12)  ,s(:  ,15)  ,s(:  ,16) 
)  ,s(:  ,19)  ,s(:  ,20)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,27)  ,s(:  ,28)  ,s(:  ,31) 
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,47)  ,s(:  ,48)  ,s( 


,39)  ,s(:  ,40)  ,s(:  ,43)  ,s(:  ,44) 
,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3 
,4)  ,s(:  ,7)  ,s(:  ,8)  ,s(:  ,11)  ,s( 
,23)  ,s  ( 

,36)  ,s  ( 

,51)  ,s  ( 

,  s  ( 


inexact ([~s(:  ,5)  ,s(:  ,7) 


,24) ,s( 
,39)  ,s( 
,52) ,s( 
,8)  , s  ( 


,11) 

,  s  ( 

,12) 

,  s  ( 

,15)  , 

s  ( 

,16) 

,  s  ( 

,  19) 

,  s  ( 

,20)  , 

s  ( 

,23) 

,  s  ( 

,24) 

,  s  ( 

,27) 

,  s  ( 

,28)  , 

s  ( 

,31) 

,  s  ( 

,32) 

,  s  ( 

,35)  , 

s  ( 

,36) 

,  s  ( 

,39) 

,  s  ( 

,40) 

,  s  ( 

,43)  , 

S  ( 

,44) 

,  s  ( 

,47) 

,  s  ( 

,48)  , 

S  ( 

,51) 

,  s  ( 

,52) 

,  s  ( 

,55) 

,  s  ( 

,56)] 

, p) , or3 

_ inexact 

([~s(:  ,6) 

,  S 

(:  ,7) 

,  s  ( 

,8)  , 

s  (  : 

,11)  , 

s  ( : 

,12)  ,s 

(: 

,15)  , 

s  ( : 

,16)  , 

s  (  : 

,  19)  ,s 

(: 

,20)  , 

s  ( :  , 

,23)  , 

s  (  : 

,24)  , 

s  ( : 

,27)  ,s 

(: 

,28)  , 

s  ( : 

,31)  , 

s  (  : 

,32)  ,  s 

(: 

,35)  , 

s  ( :  , 

,36)  , 

s  (  : 

,39)  , 

s  ( : 

,40)  ,s 

(: 

,43)  , 

s  ( : 

,44)  , 

s  (  : 

,47)  ,s 

(: 

,48)  , 

s  ( :  , 

,51)  , 

s  (  : 

,52)  , 

s  ( : 

,55)  ,  s 

(: 

,56)] 

,P) 

or3_ 

inexact  (  [ 

~  s 

(:  ,9) 

,  s  ( 

,11) 

,  s  ( 

,12) 

,  s  ( 

,15)  , 

s  ( 

,16) 

,  s  ( 

,  19) 

,  s  ( 

,20)  , 

S  ( 

,23) 

,  s  ( 

,24) 

,  s  ( 

,27) 

,  s  ( 

,28)  , 

s  ( 

,31) 

,  s  ( 

,32) 

,  s  ( 

,35)  , 

S  ( 

,36) 

,  s  ( 

,39) 

,  s  ( 

,40) 

,  s  ( 

,43)  , 

S  ( 

,44) 

,  s  ( 

,47) 

,  s  ( 

,48)  , 

S  ( 

,51) 

,  s  ( 

,52) 

,  s  ( 

,55) 

,  s  ( 

,56)] 

, p) , or3 

_ inexact 

([~s(:  ,10)  ,s(:  ,11)  ,£ 

s(:  ,12)  ,s(:  ,15)  ,s( 
s(:  ,27)  ,s(:  ,28)  ,s( 
s ( :  ,40)  ,  s  ( :  , 43)  , s  ( 
s  (  :  , 

)  ,  s  ( 

)  ,  s  ( 

)  ,s( 

3_inexact ( [~  s ( 


, 16)  ,s  ( 
,31)  ,s  ( 
,44)  ,s  ( 


,19)  ,s( 
,32)  ,s  ( 
,47)  ,s  ( 


,20)  ,s  ( 
,35)  ,s  ( 
,48)  ,s  ( 


,23)  ,s( 
,36)  ,s( 
,51)  ,s  ( 


,24)  ,s 
,39)  ,  s 
,52)  ,  s 


55)  ,  s 

(:  ,56)]  , 

p)  ,or3_inexact (  [~ 

s  ( 

,13) ,s( 

, 15)  ,s  ( 

,16) 

:  ,  19) 

,  s  ( 

,20) 

, s ( :  ,23)  , s  ( 

,24)  , 

s  ( 

,27)  ,s  ( 

,28) ,s( 

,31) 

:  ,32) 

,  s  ( 

,35) 

,  s ( :  ,36)  , s  ( 

,39)  , 

S  ( 

,40)  ,s  ( 

,43)  ,s( 

,44) 

:  ,47) 

,  s  ( 

,48) 

, s ( :  ,51)  , s  ( 

,52)  , 

S  ( 

,55)  ,s  ( 

, 56)  ]  , p)  , or3 

23) ,s( 
36) ,s( 
51) ,s( 

: ,19) ,s( 
:  ,32)  ,s  ( 
:  ,47)  ,s  ( 


,24)  ,s  ( 
,39)  ,s  ( 
,52)  ,s  ( 

:  ,20)  ,s  ( 
:  ,35)  ,s  ( 
:  ,48)  ,s  ( 


exact (  [~  s  ( 


s  ( 
s  ( 
s  ( 


,14) ,s( 

,27) ,s( 

,40) ,s( 

,55) ,s( 

:  ,23)  ,s  ( 

:  ,36)  ,s  ( 

:  ,51)  ,s  ( 

:  , 19)  ,s  ( 
:  ,32)  ,s  ( 
:  ,47)  ,s  ( 


,15) ,s( 
,28)  ,s  ( 
,43)  ,s  ( 


,16) ,s( 
,31) ,s( 
,44)  ,s( 


,19) ,s( 
,32)  ,s  ( 
,47)  ,s  ( 


,20)  , s  (  :  ,2 
,35)  , s  (  :  ,3 
,48)  ,s(:  ,5 


,56)] ,p) ,or3_inexact([~s( 


:  ,17)  ,s( 

,28)  ,s  (:  ,31)  ,s  ( 

,43)  ,s  (:  ,44)  ,s  ( 

,56)] ,p) ,or3_inexact([~s( 
,28)  ,s  (  :  ,31)  ,s  (  :  ,32)  ,s  ( 
,43)  ,s  (  :  ,44)  ,s  (  :  ,47)  ,s  ( 


,24)  ,s  ( 
,39)  ,s  ( 
,52) ,s( 

: ,20) ,s( 
:  ,35)  ,s  ( 
:  ,48)  ,s  ( 


,27)  ,s  ( 
,40)  ,s  ( 
,55)  ,s  ( 

:  ,23)  ,s  ( 
:  ,36)  ,s  ( 
:  ,51)  ,s  ( 


,28)  ,s  ( 
,43)  ,s  ( 


,18)  ,s( 
,31)  ,s( 
,44)  ,s( 


, 56) ] , p) , or3_ine 


,24) ,s( 
,39)  ,s  ( 
,52)  ,s  ( 


,27)  ,s 
,40)  ,s 
,55)  ,  s 


)  ,  s  ( 

)  ,s( 

)  ,s( 
27) ,s( 
40) ,s( 
55) ,s( 


)  ,or3_inexact([~s(:  ,26)  ,s(:  ,27)  ,s(:  ,28)  ,s(:  ,31)  ,s(:  ,32)  ,s 


,21) 

,  s  (  : 

,23) 

,  s  (  : 

,24) 

,  s  ( : 

,27) 

,35) 

,  s  (  : 

,36) 

,  s  ( : 

,39) 

,  s  (  : 

,40) 

,48) 

,  s  (  : 

,51) 

,  s  ( : 

,52) 

,  s  (  : 

,55) 

( :  ,22)  ,s 

(:  ,23)  ,s 

( :  ,24)  ,s 

(:  ,2 

(:  ,35)  ,s 

( :  ,36)  ,s 

( :  ,39)  ,s 

(:  ,4 

(:  ,48)  ,s 

( :  ,51)  ,s 

/—■\ 

CN 

LO 

(:  ,5 

~  s  (  : 

,25) 

,  s  ( : 

,27) 

,  s  (  : 

,28) 

,  s  (  : 

,  s  (  : 

,39) 

,  s  ( : 

,40) 

,  s  (  : 

,43) 

,  s  (  : 

,  s  (  : 

,52) 

,  s  ( : 

,55) 

,  s  (  : 

,56)] ,p) 

159 


376 

377 

378 
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380 

381 

382 

383 

384 

385 

386 

387 

388 

389 

390 

391 

392 

393 

394 

395 

396 

397 

398 

399 

400 

401 

402 

403 

404 

405 

406 

407 

408 

409 

410 

411 

412 

413 

414 

415 

416 

417 

418 

419 

420 

421 


s(:  ,35)  ,s(:  ,36)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,47)  ,s 

s(:  ,48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([ 

[~s(:  ,29)  ,s(:  ,31)  ,s(:  ,32)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,39)  ,s(:  ,40) 

)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55) 

)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,30)  ,s(:  ,31)  ,s(:  ,32)  ,s(:  ,3 
35)  ,s(:  ,36)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,47)  ,s(:  ,4 
48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p),or3_inexact([~s(: 

:  ,33)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,43)  ,s(:  ,44)  ,s(: 

:  ,47)  ,s(:  ,48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_ine 

exact([~s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,43)  ,s 

s(:  ,44)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  , 

,p)  ,or3_inexact([~s(:  ,37)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,43)  ,s(:  ,44) 
)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3 
3_inexact([~s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,4 

47)  ,s(:  ,48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexa 

act([~s(:  ,41)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,51)  ,s(: 

:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,42)  ,s(:  ,43)  ,s 

s  (  :  ,44)  ,s  (:  ,47)  ,s  (  :  ,48)  ,s  (  :  ,51)  ,s  (  :  ,52)  ,s  (:  ,55)  ,s  (  :  ,56)]  , 

,p)  ,or3_inexact([~s(:  ,45)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,51)  ,s(:  ,52) 
)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,46)  ,s(:  ,47)  ,s(:  ,4 

48)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p),or3_inexact([~s(: 

:  ,49)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s 
s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([ 
[~s(:  ,53)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,54)  ,s(:  ,5 
55)  ,  s  (  :  ,  56)  ]  ,  p)  ]  ,  p) )  ; 

N  =  bit  set (N , 3 , and3_inexact ([ or3_ inexact  (  [~ s  (:,  1 ),  s  ( : 


•  5 

6)  ,  s  ( 

:  ,7) 

, s ( : ,8) 

,  s  (  : 

,13)  , 

s  ( :  , 

14) 

s  (  :  , 

15)  , 

s  ( :  , 

16) 

,  s  (  : 

21 

1) 

,  s  ( :  , 

22)  ,£ 

3(: ,23) 

,  s  ( : 

,24)  , 

s  (  :  , 

29) 

s  ( :  , 

30)  , 

s  (  :  , . 

31) 

,  s  ( : 

,32 

2) 

,  s  ( :  , 

37)  ,£ 

3(: ,38) 

,  s  ( : 

,39)  , 

s  (  :  , 

40) 

s  ( :  , 

45)  , 

s  (  :  , ' 

46) 

,  s  ( : 

,47 

7) 

,  s  ( :  , 

48)  ,£ 

3(:  ,5)  , 

s  (  :  , 

53)  ,  s 

( :  ,54)  ,£ 

3 ( :  ,55)  ,  s 

(: ,56)] 

,p)  , 

or3 

3_ 

inexact  (  [' 

' s ( : ,2) 

,  s  ( : 

,6)  ,  s 

(:  ,7)  ,s 

C:  ,8) 

,  s  (  : 

,13) 

s  ( 

:  ,14) 

s 

s  ( 

:  ,  15) 

,  s  ( : 

,16) ,s( 

:  ,21)  ,s(: 

,22) 

,  s  ( 

,23) 

,  s  (  : 

,24) 

s  ( 

:  ,29) 

s 

s  ( 

:  ,30) 

,  s  ( : 

,31) ,s( 

:  ,32)  ,s(: 

,37) 

,  s  ( 

,38) 

,  s  ( : 

,39) 

s  ( 

:  ,40) 

S 

s  ( 

:  ,45) 

,  s  (  : 

,46) ,s( 

:  ,47)  ,s(: 

,48) 

,  s  ( 

,5)  , 

s  (  :  , 

53)  ,s 

3(: 

,54) 

, 

3( 

(: 

,55)  , 

s ( : , 56) ] , p) 

,  or3 

_inexact  (  [~  s 

C:  ,3) 

,  s  ( : 

,6)  , 

3(: 

,7)  , 

s 

(: 

•  5 

8)  ,  s  ( 

:  ,  13) 

, s ( :  , 14)  ,  s 

( :  , 15)  ,s  ( 

:  ,16)  ,s( 

:  ,21)  ,s  ( 

,22)  ,  s 

( 

J 

,23)  ,s  ( 

:  ,24) 

, s ( :  ,29)  ,  s 

( :  ,30)  ,s  ( 

:  ,31)  ,s( 

:  ,32)  ,s  ( 

,37)  ,s 

( 

) 

,38)  ,s  ( 

:  ,39) 

, s ( :  ,40)  ,  s 

( :  ,45)  ,s  ( 

:  ,46)  ,s( 

:  ,47)  ,s  ( 

,48)  ,s 

( 

) 

,5)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(: 


:  ,4) 

,  s  (  : 

,5) 

, S ( :  ,6)  , 

s ( : ,7) 

,  s  ( 

:  ,8) 

,  s  ( 

:  ,13) 

s  ( 

,14) 

,  s  (  : 

,15)  , 

,  s  (  : 

,16) 

s  ( 

,21)  ,s  ( 

:  ,22)  , 

s  (  : 

,23) 

,  s  ( 

:  ,24) 

s  ( 

,29) 

,  s  ( : 

,30)  , 

,  s  (  : 

,31) 

s  ( 

,32)  ,s  ( 

:  ,37)  , 

s  (  : 

,38) 

,  s  ( 

:  ,39) 

S  ( 

,40) 

,  s  ( : 

,45)  , 

,  s  (  : 

,46) 

S  ( 

,47)  ,s  ( 

:  ,48)  , 

s  (  : 

,53) 

,  s  ( 

:  ,54) 

S  ( 

,55) 

,  s  ( : 

,56)] 

P) 

, or3_inexact (  [~s ( 

, 10)  ,s  ( 

,13) 

,  s  ( 

, 14)  ,s  ( 

,15) 

,  s  ( 

,16 

,  s 

(:  ,21)  ,s(:  ,22)  ,s( 

,23) ,s( 

,24) 

,  s  ( 

,29)  ,s( 

,30) 

,  s  ( 

,31 

,  s 

(:  ,32)  ,s(:  ,37)  ,s( 

,38) ,s( 

,39) 

,  s  ( 

,40)  ,s( 

,45) 

,  s  ( 

,46 

160 
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425 
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449 
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451 

452 
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457 

458 

459 

460 

461 

462 

463 

464 

465 

466 

467 


6)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or 


r3_inexact  (  [~s ( 

,11) 

s  ( 

, 13)  ,s  ( 

,14) 

,  s  ( 

,15) ,s(: 

,16)  ,s(:  , 

,21)  ,s  ( 

,22)  ,s  ( 

,23) 

s  ( 

,24)  ,s  ( 

,29) 

,  s  ( 

,30) ,s(: 

,31)  ,s(:  , 

,32)  ,s  ( 

,37)  ,s  ( 

,38) 

s  ( 

,39)  ,s  ( 

,40) 

,  s  ( 

,45) ,s(: 

,46)  ,s(:  , 

,47)  ,s  ( 

,48)  ,s  ( 

,53) 

s  ( 

,54)  ,s  ( 

,55) 

,  s  ( 

,56)] ,p) 

, or3_inex 

xact  (  [~ 

s  (  : 

,12)  , 

s  (  : 

,13)  , 

s  (  : 

,14)  ,s 

( 

:  ,15)  , 

s  (  : 

,16)  , 

s  ( : 

,21) ,s( 

(: 

,22)  , 

s  (  : 

,23)  , 

s  ( : 

,24)  , 

s  (  : 

,29)  ,  s 

( 

:  ,30)  , 

s  (  : 

,31)  , 

s  ( : 

,32) ,s( 

(: 

,37)  , 

s  (  : 

,38)  , 

s  ( : 

,39)  , 

s  (  : 

,40)  ,s 

( 

:  ,45)  , 

s  (  : 

,46)  , 

s  ( : 

,47) ,s( 

(: 

,48)  , 

s  (  : 

,53)  , 

s  ( : 

,54)  , 

s  (  : 

,55)  ,  s 

( 

: ,56)] 

,P) 

o 

CO 

1 

inexact ( [~ 

~  s 

(:  ,9) 

,  s  ( 

:  ,13) 

,  s  ( 

:  ,14) 

,  s  ( 

:  ,15)  , 

s 

( : , 16) 

,  s  ( 

:  ,21) 

,  s  ( 

:  ,22)  ,  s 

S  ( 

:  ,23) 

,  s  ( 

:  ,24) 

,  s  ( 

:  ,29) 

,  s  ( 

:  ,30)  , 

s 

( : ,31) 

,  s  ( 

:  ,32) 

,  s  ( 

:  ,37)  ,s 

S  ( 

:  ,38) 

,  s  ( 

:  ,39) 

,  s  ( 

:  ,40) 

,  s  ( 

:  ,45)  , 

s 

( : ,46) 

,  s  ( 

;  ,47) 

,  s  ( 

w 

CO 

S  ( 

:  ,53) 

,  s  ( 

:  ,54) 

,  s  ( 

:  ,55) 

,  s  ( 

: ,56)] 

p) , or3 

_inexact 

([- 

s(:  ,17) 

)  ,s(: 

,21)  ,s  ( 

,22)  ,s  (  : 

,23)  ,s  ( 

,24)  ,s  (  : 

,29) 

, s  ( :  ,  30)  , 

s(:  ,31) 

)  ,  s  (  : 

,32)  ,s  ( 

,37) ,s(: 

,38)  ,s  ( 

,39) ,s(: 

,40) 

,s(:  ,45)  , 

s(:  ,46) 

)  ,  s  (  : 

,47)  ,s  ( 

,48) ,s(: 

,53)  ,s  ( 

,54) ,s(: 

,55) 

,s(:  ,56)] 

, p) , or3 

3_inexact  (  [~  s  ( 
29)  ,s( 

40)  ,s( 

55)  ,s( 


: ,18) ,s( 

,21)  ,s  ( 

,22) ,s( 

,23) 

, s  (  :  ,  24)  ,  s  ( 

,2 

: ,31) ,s( 

,32)  ,s  ( 

,37) ,s( 

,38) 

, s ( :  ,39)  ,  s  ( 

,4 

: ,46) ,s( 

,47)  ,s  ( 

,48)  ,s( 

,53) 

,  s  (  :  ,54)  ,  s  ( 

,5 

,30)  ,s  ( 

,45)  ,s  ( 

,56)]  ,p)  ,or3_inexact([~s( 


:  ,23)  ,s  ( 

:  ,38)  ,s  ( 

:  ,53)  ,s  ( 
s  (  : 
s  (  : 
s  (  : 

inexact (  [~  s ( 
)  , s ( :  , 38)  , s  ( 
)  , s ( :  , 53)  , s  ( 


,24) ,s( 
,39)  ,s  ( 
,54)  ,s  ( 


,29)  ,s  ( 
,40)  ,s  ( 
,55)  ,s  ( 


:  , 19)  ,s  ( 

,21) ,s(: 

,22) 

,  s  (  : 

:  ,31)  ,s  ( 

,32)  ,s  (  : 

,37) 

,  s  (  : 

:  ,46)  ,s  ( 

,47)  ,s  (  : 

,48) 

,  s  (  : 

,30) ,s( 

,45)  ,s  ( 

,56)] ,p) ,or3_inexact([~s( 


:  ,21)  ,s  ( 

,22) ,s( 

,23)  ,s  ( 

,24) 

, s ( :  ,29)  , s  ( 

,30)  , 

s ( :  ,31)  ,  s 

:  ,32)  ,s  ( 

,37) ,s( 

,38)  ,s  ( 

,39) 

, s ( :  ,40)  ,  s  ( 

,45)  , 

s ( :  ,46)  ,  s 

:  ,47)  ,s  ( 

,48)  ,s  ( 

,53)  ,s  ( 

,54) 

, s ( :  ,55)  , s  ( 

,56)] 

, p) , or3_i 

,20)  ,  s 


,25)  ,s  ( 
,39)  ,s  ( 
,54)  ,s  ( 


,29)  ,s  ( 
,40)  ,s  ( 
,55)  ,s  ( 


,30)  ,s(:  ,31)  ,s(:  ,32)  ,s(:  ,37) 
,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48) 


26)  ,s( 
39)  ,s( 
54)  ,s( 


,29)  ,s  ( 
,40)  ,s  ( 
,55)  ,s  ( 


,56)]  ,p)  ,or3_inexact([~s( 
,30)  ,s(:  ,31)  ,s(:  ,32)  ,s(:  ,37)  ,s(:  ,38)  ,s( 
,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s( 


,2 

,3 

,5 


,56)]  ,p)  ,or3_inexact  ([~s( 
,30)  ,s(:  ,31)  ,s(:  ,32)  ,s(:  ,37)  ,s(:  ,38)  ,s( 
,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s( 


,27)  ,s( 
,39) ,s( 
,54)  ,s  ( 


,56)]  ,p)  ,or3_inexact ([~s( 
s  (  :  ,32)  ,s  (:  ,37)  ,s  (  :  ,38)  ,s  ( 
s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s( 


:  ,29)  ,s( 

:  ,40)  ,s( 

:  ,55)  ,s( 
,30)  , s ( :  ,31)  ,  s 
,45)  ,s ( :  ,46)  ,s 
, 56) ] , p) , or3_i 


:  ,28)  , s ( :  ,29)  ,  s  ( 

:  ,39)  ,s ( :  ,40)  ,s  ( 

:  ,54)  ,s ( :  ,55)  ,s  ( 
inexact([~s(: ,33) ,s(: ,37) ,s(: ,38) ,s(: ,39) ,s(: ,40) ,s(: ,45) 
)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56) 
)]  ,p)  ,or3_inexact([~s(:  ,34)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,4 

40)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,5 

55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,35)  ,s(:  ,37)  ,s(:  ,38)  ,s(: 

:  ,39)  ,s(:  ,40)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s(: 

:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,36)  ,s(:  ,37)  ,s 

s  (  :  ,38)  ,s  (:  ,39)  ,s  (  :  ,40)  ,s  (:  ,45)  ,s  (  :  ,46)  ,s  (:  ,47)  ,s  (  :  ,48)  ,s 

s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,41) 


161 


468 

469 

470 

471 

472 

473 

474 

475 

476 

477 

478 

479 

480 

481 

482 

483 

484 

485 

486 

487 

488 

489 

490 

491 

492 

493 

494 

495 
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)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55) 
)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,42)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,4 
47)  ,s(:  ,48)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexa 
act([~s(:  ,43)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s(: 

:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,44)  ,s(:  ,45)  ,s 
s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  , 
,p)  ,or3_inexact([~s(:  ,49)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56) 
)]  ,p)  ,or3_inexact([~s(:  ,50)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,5 
56)]  ,p),or3_inexact([~s(:  ,51)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(: 

:  ,56)]  ,p)  ,or3_inexact([~s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s 
s  (  :  , 56) ]  , p) ]  , p) )  ; 

N  =  bit  set (N , 4 , and3_inexact ( [ or3_inexact  (  [s  ( :  ,  1 )  ,  s  ( :  , 


,2)  , s  (  :  , 

3)  ,s(:  , 

4) 

,  S  (  :  , 5)  , 

si:  ,6)  ,  s 

(:  ,7)  ,s( 

8)  ,  s  (  : 

,17) ,s(: 

:  ,18)  ,s( 

:  ,19)  ,s 

(: 

,20)  ,s  (: 

,21)  ,s  (  : 

,22) ,s(: 

,23)  ,s  (  : 

,24) ,s(: 

:  ,33)  ,s  ( 

:  ,34)  ,s 

(: 

,35) ,s(: 

,36)  ,s(: 

,37) ,s(: 

,38)  ,s  (  : 

,39)  ,s(: 

:  ,40)  ,s  ( 

:  ,49)  ,s 

(: 

,50) ,s(: 

,51) ,s(: 

,52) ,s(: 

,53) ,s(: 

,54)  ,s(: 

:  ,55)  ,s  ( 

:  ,56)]  , 

P) 

, or3_inexact  (  [~s 

(:  ,10)  ,s 

( 

,17)  ,s 

(:  ,18)  ,s 

s ( :  , 19)  , 

s (: ,20) 

,  s 

( :  ,21)  ,s 

(:  ,22)  ,s 

( :  ,23)  ,s 

( 

,24)  ,s 

( :  ,33)  ,s 

s (  :  ,34)  , 

s(:  ,35) 

,  s 

( :  ,36)  ,s 

(:  ,37)  ,s 

( :  ,38)  ,s 

( 

,39)  ,  s 

( :  ,40)  ,s 

s ( :  ,49)  , 

s(:  ,50) 

,  S 

( :  ,51)  ,s 

(:  ,52)  ,s 

( :  ,53)  ,s 

( 

,54)  ,s 

( :  ,55)  ,s 

s(:  ,56)]  , 

1 

ro 

o 

inexact  (  [ 

~  s  (  : 

,11) 

,  s  ( : 

,17) 

,  s  (  :  , 

18) 

,  s  ( : 

,19) 

)  ,s(: 

,20) 

,s(:  ,21)  ,s(: 

,22) 

,  s  ( : 

,23) 

,  s  (  : 

,24) 

,  s  ( :  , 

33) 

,  s  (  : 

,34) 

)  ,s(: 

,35) 

,  s  (  :  ,36)  ,  s  (  : 

,37) 

,  s  ( : 

,38) 

,  s  (  : 

,39) 

,  s  ( :  , 

40) 

,  s  (  : 

,49) 

)  ,  s  (  : 

,50) 

,s(:  ,51)  ,s(: 

,52) 

,  s  ( : 

,53) 

,  s  (  : 

,54) 

,  s  ( :  , 

55) 

,  s  (  : 

,56) 

)  ]  ,  p) 

,  or3 

_inexact  (  [~  s 

(:  ,12)  ,s 

(: 

,17)  ,s 

(:  ,18)  ,s( 

:  ,  19)  ,s 

(:  ,2 

20)  ,  s 

(:  ,21)  ,s(:  , 

22)  ,  s 

( :  ,23)  ,s 

(: 

,24)  ,s 

( :  ,33)  ,s  ( 

:  ,34)  ,s 

(:  ,3 

35)  ,  s 

(:  ,36)  ,s(:  , 

37)  ,s 

( :  ,38)  ,s 

(: 

,39)  ,  s 

(  :  ,40)  ,s  ( 

:  ,49)  ,s 

(:  ,5 

50)  ,  s 

(:  ,51)  ,s(:  , 

52)  ,  s 

( :  ,53)  ,s 

(: 

,54)  ,s 

( :  ,55)  ,s  ( 

:  ,56)]  , 

p)  ,  0 

or3_inexact (  [~s ( 

:  ,13) 

,  s  ( : 

,17) 

,  s 

(: 

,  18) 

,  s  (  : 

,19)  , 

s  (  : 

,20) 

,  s  (  : 

:  ,21) 

,  s  (  : 

,22)  ,s  ( 

:  ,23) 

,  s  ( : 

,24) 

,  s 

(: 

,33) 

,  s  ( : 

,34)  , 

s  (  : 

,35) 

,  s  (  : 

:  ,36) 

,  s  ( : 

,37)  ,s  ( 

:  ,38) 

,  s  ( : 

,39) 

,  S 

(: 

,40) 

,  s  ( : 

,49)  , 

s  (  : 

,50) 

,  s  (  : 

:  ,51) 

,  s  (  : 

,52)  ,s  ( 

:  ,53) 

,  s  ( : 

,54) 

,  S 

(: 

,55) 

,  s  ( : 

,56)] 

,P) 

,  or3 

_ine 

exact  (  [~  s  ( 

,14) 

,  s  ( 

, 17)  ,s  ( 

,18) 

,  s  ( 

,19) ,s(: 

,20) 

, s ( :  ,21)  ,  s 

s  ( 

,22)  ,s  ( 

,23) 

,  s  ( 

,24)  ,s  ( 

,33) 

,  s  ( 

,34) ,s(: 

,35) 

, s ( :  ,36)  ,  s 

s  ( 

,37)  ,s  ( 

,38) 

,  s  ( 

,39)  ,s  ( 

,40) 

,  s  ( 

,49)  ,s  (: 

,50) 

, s ( :  ,51)  ,  s 

s  ( 

,52)  ,s  ( 

,53) 

,  s  ( 

,54)  ,s  ( 

,55) 

,  s  ( 

,56)] ,p) 

,  or3 

_inexact  (  [ 

[~s(:  ,15)  ,s(:  ,17)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22) 

)  ,s(:  ,23)  ,s(:  ,24)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37) 

)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52) 

)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s( 

16)  ,s(:  ,17)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22)  ,s( 

23)  ,s(:  ,24)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37)  ,s( 

38)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s( 

53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,9)  ,s(:  , 

,17)  ,s(:  ,18)  ,s(:  ,19)  ,s(:  ,20)  ,s(:  ,21)  ,s(:  ,22)  ,s(:  ,23)  ,s(:  , 
,24)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  , 


,1 

,2 

,3 
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,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  , 
,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,25)  ,s(:  ,33)  ,s( 
(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s( 

(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s( 

(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,26)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  , 
,s(:  ,36)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  , 

,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_ 

_inexact([~s(:  ,27)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37 
7)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52 

2)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  , 

,28)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  , 

,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  , 

,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,29)  ,s(:  ,33)  ,s( 
(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s( 

(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s( 

(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,30)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  , 
,s(:  ,36)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  , 

,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_ 

_inexact([~s(:  ,31)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37 
7)  ,s(:  ,38)  ,s(:  ,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52 

2)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  , 

,32)  ,s(:  ,33)  ,s(:  ,34)  ,s(:  ,35)  ,s(:  ,36)  ,s(:  ,37)  ,s(:  ,38)  ,s(:  , 

,39)  ,s(:  ,40)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  , 

,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,41)  ,s(:  ,49)  ,s( 
(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p 
p)  ,or3_inexact([~s(:  ,42)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  , 
,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,43 

3)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55 
5)  ,s(:  ,56)]  ,p),or3_inexact([~s(:  ,44)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  , 
,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inex 
xact([~s(:  ,45)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s( 
(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,46)  ,s(:  ,49)  , 
,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)] 
] ,p) ,or3_inexact([~s(: ,47) ,s(: ,49) ,s(: ,50) ,s(: ,51) ,s(: ,52 
2)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  , 
,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  , 


,55)  ,  s  (  :  ,  56)  ]  ,  p)  ]  ,  p)  )  ; 


N  = 

bitset (N ,5 , and3_ 

inexact ( [or3_inexact ( [s ( 

:  ,9) 

,  s  ( : 

,10) 

,  s  ( 

,11)  ,s(:  ,12)  ,s(: 

, 13)  ,s  ( 

,14)  ,s(: 

,15) ,s(: 

,16) 

,  s  ( : 

,17) 

,  s  ( 

,18)  ,s(:  ,19)  ,s(: 

,20)  ,s  ( 

,21)  ,s  (  : 

,22)  ,s(: 

,23) 

,  s  (  : 

,24) 

,  s  ( 

,41)  ,s(:  ,42)  ,s(: 

,43)  ,s  ( 

,44) ,s(: 

,45) ,s(: 

,46) 

,  s  ( : 

,47) 

,  s  ( 

,48)  ,s(:  ,49)  ,s(: 

,50)  ,s  ( 

,51)  ,s(: 

,52)  ,s(: 

,53) 

,  s  ( : 

,54) 

,  s  ( 

,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s 

(:  ,25)  ,s 

( :  ,41)  ,s 

,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s( 
,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s( 


,47)  ,s  ( 
,54)  ,s  ( 


,48)  ,s( 
,55)  ,s( 


,56)]  ,p)  ,or3_inexact([~s(:  ,26)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43) 
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,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  , 

,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_ 

_inexact([~s(:  ,27)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45 
5)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52 
2)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  , 
,28)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  , 

,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  , 

,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,29)  ,s(:  ,41)  ,s( 
(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s( 

(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s( 

(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,30)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  , 
,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  , 

,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_ 

_inexact([~s(:  ,31)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45 
5)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52 

2)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  , 

,32)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  , 

,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  , 

,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,33)  ,s(:  ,41)  ,s( 
(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s( 

(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s( 

(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,34)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  , 
,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  , 

,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_ 

_inexact([~s(:  ,35)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45 
5)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52 

2)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  , 

,36)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  , 

,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  , 

,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,37)  ,s(:  ,41)  ,s( 
(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s( 

(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s( 

(:  ,56)]  ,p)  ,or3_inexact([~s(:  ,38)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  , 
,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  , 

,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_ 

_inexact([~s(:  ,39)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45 
5)  ,s(:  ,46)  ,s(:  ,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52 

2)  ,s(:  ,53)  ,s(:  ,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)  ,or3_inexact([~s(:  , 

,40)  ,s(:  ,41)  ,s(:  ,42)  ,s(:  ,43)  ,s(:  ,44)  ,s(:  ,45)  ,s(:  ,46)  ,s(:  , 

,47)  ,s(:  ,48)  ,s(:  ,49)  ,s(:  ,50)  ,s(:  ,51)  ,s(:  ,52)  ,s(:  ,53)  ,s(:  , 

,54)  ,s(:  ,55)  ,s(:  ,56)]  ,p)]  ,p))  ; 


N  =  bitset  (N  ,  6  , 

or3_inexact  (  [ 

s ( :  ,25)  , s  ( 

:  ,26) 

,  s  ( 

,27)  ,s 

:  ,28)  ,s  ( 

,29) 

,  s  ( 

,30)  ,s  ( 

,31)  , 

s ( :  ,32)  , s  ( 

:  ,33) 

,  s  ( 

,34)  ,s 

:  ,35)  ,s  ( 

,36) 

,  s  ( 

,37)  ,s  ( 

,38)  , 

s ( :  ,39)  , s  ( 

:  ,40) 

,  s  ( 

,41)  ,s 

:  ,42)  ,s  ( 

,43) 

,  s  ( 

,44)  ,s  ( 

,45)  , 

s ( :  ,46)  , s  ( 

:  ,47) 

,  s  ( 

,48)  ,s 

:  ,49)  ,s  ( 

,50) 

,  s  ( 

,51)  ,s  ( 

,52)  , 

s (  :  ,53)  , s  ( 

:  ,54) 

,  s  ( 

,55)  ,  s 
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606 

607 

608 

609 

610 

611 

612 

613 

614 

615 

616 

617 

618 

619 

620 

621 


s ( :  , 56) ]  , p) )  ; 


end 

N  =  reshape (N , size (Ms2) ) ; 
end 

function  R  =  roundoff (  Ms3 ,  p  ) 

s  =  logical (bitget (Ms3  ,  2) )  ; 
r  =  logical (bitget (Ms3  ,  3) )  ; 
mO  =  logical (bitget (Ms3 , 4) )  ; 

R  =  and_inexact (r , or_inexact (mO , s , p)  ,p)  ; 

end 
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Appendix  C.  Inexact  Integer  Multipliers 


Shift-and-Add  Multiplier 

function  [  P  ]  =  Multiplier_basic_inexact (  A,  B,  na , 

,  bit  ) 

%bit :  The  highest  -  order  bit  which  can  be  inexact. 

switch  class (A) 

case  { ’ intS ’ , ’ uintS ’ } 
naO  =  8; 

case  {  ’  int 16 '  ,  ’ uint 16 ’ } 
naO  =  16; 

case  { ’ int32 ' , ’ uint32 ' } 
naO  =  32; 

case  { ’ int64 '  ,  ’ uint64  '  } 
naO  =  64; 
otherwise 

error  ’ Mult ipli cands  umust  ubeuof  utheuintegeru 
classes .  ' 

end 

[A,  signA ,  signedA]  =  unsigned(A); 

switch  class (B) 

case  { ’ int8 ’ , ’ uintS ’ } 
nbO  =  8; 

case  { ’ int 1 6  '  ,  ’ uint 16  '  } 
nbO  =  16; 

case  { ’ int32  '  ,  ’  uint32  '  } 
nbO  =  32; 

case  { ’ int64  ’  ,  ’ uint64  '  } 
nbO  =  64; 
otherwise 

error  ’ Mult ipli cands  umust  ubeuof  utheuintegeru 
classes .  ' 

end 

[B ,  signB  ,  signedB]  =  unsigned(B); 

if  (~exist('na',’var’))  II  is empty (na) 
na  =  naO ; 

end 

if  (~exist(’nb',’var’))  II  is empty (nb) 
nb  =  nbO ; 

end 


nb  ,  p 
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if  ~  exist  (’ p var  ’  ) 
p  =  1; 

end 

if  exist (' bit var O 

pi  =  1;  %  pi  is  the  probability  of 

correctness  for  signed  math 

else 

bit  =  Inf  ; 
pi  =  p; 

end 

if  signedA 

OxFFFF  =  cast (-1 ,  ’ like  ’  , A)  ; 

OxFOOO  =  bitshift (OxFFFF , na) ; 

OxOFFF  =  bitcmp ( OxFOOO ) ; 

A  =  bitand (A , OxOFFF ) ; 

A  =  bitxor_inexact  (A  ,  cast ( signA like OxOFFF )  * 
OxOFFF,  pi,  class (A) ,  1 : min ( [na , bit -nb] ) ) ;  % 

ones  complement 

A  =  Adder_RCA_inexact (na , A , zeros ( ’ like ’  , A)  , signA , pi  ,  1 : 
min ( [na , bit -nb] ), false ) ;  %  twos  complement 

else 

OxFFFF  =  intmax ( class (A) ) ; 

OxFOOO  =  bitshift (OxFFFF , na) ; 

OxOFFF  =  bitcmp ( OxFOOO ) ; 

A  =  bitand (A , OxOFFF ) ; 

end 

if  signedB 

OxFFFF  =  cast (-1 ,’ like  ’  ,B)  ; 

OxFOOO  =  bitshift (OxFFFF , nb) ; 

OxOFFF  =  bitcmp ( OxFOOO ) ; 

B  =  bitand (B , OxOFFF ) ; 

B  =  bitxor_inexact (B ,  cast ( signB like ’, OxOFFF )  * 

OxOFFF,  pi,  class (B),  l:nb);  %  ones  complement 

B  =  Adder_RCA_inexact (nb , B , zeros (' like ’,  B)  ,  signB  ,  pi  ,  1 : 
nb,  false);  °/o  twos  complement 

else 

OxFFFF  =  intmax ( class (B) ) ; 

OxFOOO  =  bitshift (OxFFFF , nb) ; 

OxOFFF  =  bitcmp ( OxFOOO ) ; 

B  =  bitand (B , OxOFFF) ; 

end 
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na  =  na  -  signedA; 
nb  =  nb  -  signedB; 

np  =  na  +  nb  +  (signedA  |  |  signedB)  ; 


bit  =  min  (  [bit  ,  np]  )  ; 

if  np  <=  8 

npO  =  8; 

P  =  zeros ( size  (A)  uint8 ’)  ; 

A  =  uint8 (A)  ; 
elseif  np  <=  16 
npO  =  16; 

P  =  zeros ( size  ( A)  ,  ’ uint 16 ’ )  ; 
A  =  uint 16(A) ; 
elseif  np  <=  32 
npO  =  32 ; 

P  =  zeros ( size (A)  uint32 ’)  ; 
A  =  uint32 (A) ; 
elseif  np  <=  64 
npO  =  64; 

P  =  zeros ( size  (A)  uint64 ’)  ; 
A  =  uint64 ( A) ; 

else 

P  =  zeros ( size (A)  double ’)  ; 
A  =  double  (A)  ; 

end 


np  =  na  +  nb ; 


allbitsO  =  zeros ( size (A)  ,  ’ like  ’  ,  A)  ; 
allbitsl  =  allbitsO; 
allbitsl(:)  =  pow2 (na)  -  1; 

for  i  =  1  :  nb 

j  =  logical (bitget (B  (:),  i) )  ; 
if  i  ==  1 

P(j)  =  Adder_RCA_inexact (np ,  bitshif t (bitand (A ( j ) , 
allbitsl  (j ))  ,  i-1)  ,  P(j),  []  ,  p,  i:min([i+na 

- 1 , bit ] ) >  true ) ; 

else 

P(j)  =  Adder_RCA_inexact (np ,  bitshif t (bitand (A (j )  , 
allbitsl  (j ))  ,  i-1),  P(j),  []  ,  p,  i:min([i+na 

-l,bit]),  false); 

end 

end 
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if  signedA  | |  signedB 

signP  =  xor_inexact ( signA , signB , pi ) ; 

OxFFFF  =  intmax ( class (P) ) ; 

P  =  bitxor_inexact  (P  ,  cast ( signP like DxFFFF )  * 

OxFFFF,  p,  class (P),  l:bit);  %  ones  complement 

P  =  Adder_RCA_inexact (npO , P , zeros (’ like P)  , signP , p  ,  1 : 

bit, false);  %  twos  complement 

P  =  signed  (P)  ; 

end 

P  =  correct_upperbits (P , np) ; 
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3.2  Wallace  Tree  Multiplier 


3.2.1  Main  Function. 

1  function  [  P  ]  =  niultiplier_wallace_tree_inexact_PBL  (  A,  B 

.  P  ) 

2 

3  switch  class(A) 


4 

case  { ’ intS ’  , 

’ uint8  '  } 

5 

na  =  8 ; 

6 

case  { ’ int 16 ' , 

’uintl6  ’> 

7 

na  =  16 ; 

8 

case  { ’ int32 '  , 

’ uint32  ’  } 

9 

na  =  32; 

10 

case  { ’ int64  '  , 

’ uint64  ’  } 

11 

na  =  64; 

12 

otherwise 

13 

error  ’  Mult  ipli cands uHiust ubeuof  utheuintegeru 

classes 

} 

14 

end 

15 

16 

switch  class (B) 

17 

case  { ’ int8 ’  , 

’ uint8  '  > 

18 

nb  =  8 ; 

19 

case  { ’ int 16', 

’ uintl6  ’  } 

20 

nb  =  16; 

21 

case  { ’ int32 ' , 

’ uint32  ’  } 

22 

nb  =  32; 

23 

case  { ’ int64 ’  , 

’ uint64  ’  > 

24 

nb  =  64; 

25 

otherwise 

26 

error  '  Mult ipli cands uHiust ubeuof  utheuintegeru 

classes 

> 

27 

end 

28 

29 

n  = 

max ( [na  ,  nb]  )  ; 

30 

31 

switch  n 

32 

case  8 

33 

classnameO 

=  ’ uint8  '  ; 

34 

classname 

=  ' uint 16'; 

35 

A  =  uint 1 6 ( A  (  : ) )  ;  B  =  uint 16 (B  ( : ) )  ; 

36 

case  16 

37 

classnameO 

=  ' uint 16  '  ; 

38 

classname 

=  ' uint32  '  ; 

39 

A  =  uint32 ( A  (  :  )  )  ;  B  =  uint32 (B  ( :  )  )  ; 
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case  32 

classnameO  =  ’uiiit32’; 
classname  =  'uint64'; 

A  =  uint64 ( A ( : ) ) ;  B  =  uint64 (B  ( : ) )  ; 
case  64 

classnameO  =  ’uint64’; 

classname  =  Mouble  '  ; 

A  =  double (A  (:))  ;  B  =  double (B  (:))  ; 

end 

P  =  zeros ( size  (A)  , classname )  ; 
r  =  1  :  nb ; 

t  =  zeros  (  size  (r)  ,  classname )  ; 
t(:)  =  intmax ( classnameO ) ; 
t  =  bit  shift (t , r - 1 )  ; 

pp  =  t(l)  *  bitget ( repmat  (B  ,  [1 , nb] )  ,  repmat (r  ,  [numel (B) 

,1]))  ; 

pp  =  bitand_inexact (pp ,  repmat (A ,  [l,nb]),  p,  classname, 

1 :  n)  ; 

pp  =  bitshiftCpp,  repmat (r-1 ,  [numel (B)  ,  1] ) )  ; 

npp  =  size(pp,2); 
while  npp  >  2 

j  =  1; 

for  k  =  1  :  3  :  npp 

if  k  <=  (npp  -  2) 

[pp(:,j),  pp(:,j+l),  t(j),  t(j+l)]  = 

wallace_lbit_adder_inexact_PBL  (  .  .  . 

pp(:,k),  pp(:,k+l),  pp(:,k+2),  t(k),  t(k 

+  1)  ,  t (k  +  2)  ,  p)  ; 
npp2  =  j  +  1; 

r (j  : (j+1) )  =  r ((k+1)  :  (k  +  2) )  ; 
j  =  j  +  2; 

elseif  k  ==  (npp  -  1) 

pp ( :  , j  : ( j +1) )  =  pp  ( :  , k : (k  +  1) )  ; 

npp2  =  j  +  1; 

r(j  :  (j+1)  )  =  r(k:  (k  +  1)  )  ; 

t(j:(j+l))  =  t(k:(k  +  l)); 

j  =  j  +  2; 

else 

pp  ( :  ,  j  )  =  pp  (  :  ,  k)  ; 

npp2  =  j  ; 
r(j)  =  r(k) ; 
t  (  j  )  =  t  (k)  ; 

j  =  j  +  1; 

end 
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82  end 

83  pp  =  pp ( : , 1 : npp2 ) ; 

84  r  =  r (1 : npp2) ; 

85  t  =  t ( 1 : npp2 ) ; 

86  npp  =  size(pp,2); 

87  end 

88 

89  logn2  =  2  *  log2  (n)  -  1; 

90  if  n  <=  16 

91  logn2maskl  =  pow2 (2*n  -  logn2)  -  1; 

92  else 

93  logn2niaskl  =  intmax  (  classname  )  ; 

94  for  i  =  (2=t=n  -  logn2  +  1)  :  (2*n) 

95  logn2inaskl  =  bitset  ( logn2maskl  ,  i  ,  0)  ; 

96  end 

97  end 

98  logn2mask2  =  pow2(logn2)  -  1; 

99  P(:)  =  1 ing_adder_ inexact _PBL ( bit  shift (pp  (:,  1 ),- logn2 )  , 

bitshif t (pp  ( :  ,  2)  , -logn2 )  ,  []  ,  p)  ; 

100  P(:)  =  bitand (P  (  : )  , logn2maskl )  ; 

101  P(:)  =  bitshif t  (P  (  : )  , logn2 )  ; 

102  P(:)  =  bitor(bitor(P(:),bitand(pp(:,l),logn2mask2)),bitand 

(pp(: ,2) ,logn2mask2)) ; 

3.2.2  1-Bit  Adder  Subfunction. 

1  function  [  S,  Gout,  Smask  ,  Coutmask  ]  = 

wallace_lbit_adder_inexact_PBL  (  .  .  . 

2  A,  B,  Gin,  Amask ,  Bmask ,  Ginmask ,  p  ) 

3 

4  fullmask  =  bitand (bitand (Amask , Bmask) , Ginmask) ; 

5  halfmaskl  =  bitand (bitand (Amask , Bmask) , bitcmp (Ginmask) ) ; 

6  halfmask2  =  bitand (bitand (Amask , Ginmask) , bitcmp (Bmask) ) ; 

7  halfmaskB  =  bitand (bitand (Bmask , Ginmask) , bitcmp (Amask) ) ; 

8  Smask  =  bitor (bitor (Amask , Bmask) , Ginmask) ; 

9  Goutmask  =  bitshif t (bitor (bitor (fullmask , halfmaskl ), bitor ( 

halfmask2  ,halfmask3))  ,1)  ; 

10  noaddmask  =  bitand (Smask , maj ority (bitcmp (Amask)  , bitcmp  ( 

Bmask) , bitcmp (Ginmask) ) ) ; 

11  AxorB  =  bit and ( bitxor_ inexact (A , B , p) , fullmask ) ; 

12  Sn  =  bitand (bitor_inexact (bitor_inexact (A , B , p)  , Gin , p)  , 

noaddmask ) ; 

13  Sf  =  bit and ( bitxor_ inexact ( AxorB , Gin , p ), fullmask ) ; 

14  SI  =  bitand (bitxor_inexact (A , B , p) , half maskl ) ; 

15  S2  =  bitand (bitxor_inexact (A , Gin , p) , half mask2 ) ; 

16  S3  =  bitand (bitxor_inexact (B , Gin , p) , half mask3 ) ; 
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17  Coutf  =  bitand (bitor_inexact (bitand_inexact (AxorB , Cin , p)  , 

18  bitand_inexact(A,B,p) ,p) ,fullmask) ; 

19  Coutl  =  bit and ( bit and_ inexact (A  ,  B  ,  p)  , half maskl )  ; 

20  Cout2  =  bit and ( bit and_ inexact (A , Cin , p) , half mask2 ) ; 

21  Cents  =  bit and ( bit and_ inexact (B , Cin , p) , half maskS ) ; 

22  S  =  bit  or  ( bit  or  ( bit  or  ( Sf  ,  S 1 )  ,  bit  or  ( S2  ,  S3 ) )  ,  Sn )  ; 

23  Cent  =  bitshift (bitor (bitor (Coutf , Coutl ), bitor (Cout2 , CoutS 

))  .1)  ; 
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3.3  Baugh- Wooley  Multiplier 


The  Baugh- Wooley  multiplier  is  capable  of  directly  multiplying  signed  integers. 

1  function  [  P  ]  =  niultiplier_baugh_wooley_inexact_PBL  (  A,  B 

.  P  ) 


3 

switch  class (A) 

4 

case  { ’ intS ’  , 

' uint8 ' } 

5 

na  =  8 ; 

6 

case  { ’ int 16 ' , 

'uintl6 '} 

7 

na  =  16 ; 

8 

case  { ’ int32 '  , 

' uint32 ' } 

9 

na  =  32; 

10 

case  { ’ int64  '  , 

' uint64 ' } 

11 

na  =  64; 

12 

otherwise 

13 

error  'Multiplicands 

classes 

14 

15 

end 

16 

switch  class (B) 

17 

case  { ’ intS ’  , 

' uint8 ' > 

18 

nb  =  8 ; 

19 

case  { ’ int 16', 

' uintl6 ' > 

20 

nb  =  16; 

21 

case  { ' int32 '  , 

' uint32 ' } 

22 

nb  =  32; 

23 

case  { ' int64 '  , 

' uint64 ' } 

24 

nb  =  64; 

25 

otherwise 

26 

error  'Multiplicands 

classes 

} 

27 

28 

end 

29 

switch  class (A) 

30 

case  { ' int8 '  , 

'intl6',  ' 

31 

sa  =  true ; 

32 

case  { ' uint8  '  , 

' uint 16  '  , 

33 

sa  =  false 

> 

34 

35 

end 

36 

switch  class (B) 

37 

case  { ' int8 '  , 

'intl6',  ' 

38 

sb  =  true ; 

39 

case  { ' uint8 '  , 

' uint 16  '  , 
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sb  =  false ; 

end 

if  (na  ~=  nb)  &&  (sa  ~=  sb) 

error  ’ If uus ingu s ignedu integer s , ubothumultiplicandsu 
must ubeuof uthe u same u class . ’ 

end 


n  =  na  +  nb ; 
if  n  <=  8 

classname  = 
elseif  n  <=  16 
classname  = 
elseif  n  <=  32 
classname  = 
elseif  n  <=  64 
classname  = 

else 

classname  = 

end 


’ int8  ’  ; 

’ intl6  ’  ; 

’ int32  ’  ; 

’ int64  '  ; 

’ double  ’  ; 


if  ~  ( sa  II  sb ) 

classname  =  regexprep(classname,’int’,’uintO; 

end 


if  sa 


0x7FFF 

= 

intmax(class(A)) ; 

0x8000 

= 

intmin(class(A)) ; 

OxFFFF 

= 

-ones ( ’ like  ’  , A)  ; 

else 

0x8000 

= 

bitset (0 ,na , class  (A) ) 

OxFFFF 

= 

intmax(class(A)) ; 

0x7FFF 

= 

bitxor (OxFFFF , 0x8000) 

end 

if  ~ 

exist  ( 
p  =  1; 

’p' 

’  ,  ’ var  ’  ) 

end 

P  =  zeros  (  size  (A)  , classname )  ; 
bb  =  zeros ( size (A)  ,  ’ like ’  , A)  ; 

bb ( : )  =  bitget  (B  (  : )  ,  1)  ; 

ss  =  bitand_inexact (A  (  : )  , DxFFFF*bb  ( :  )  ,p , class (A) )  ; 
if  sa 
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ss  =  bitxor ( ss , 0x8000 )  ;  %  NAND  at 

uppermost  bit 

end 

P  (  :  )  =  bitget (ss  ,  1)  ; 
ss  =  bitshif t (ss , -1) ; 

ss  =  bitand ( ss , Dx7FFF )  ;  %  0  at 

uppermost  bit 

bb  ( :  )  =  bitget  (B  (:),  2)  ; 

ab  =  bitand_inexact ( A  ( : )  , 0xFFFF*bb  (  : )  ,p , class ( A) )  ; 
if  sa 

ab  =  bitxor ( ab , 0x8000 )  ;  %  NAND  at 

uppermost  bit 

end 

cc  =  bitand_inexact(ab,ss,p,class(A),l:(na-l));  % 

half  adder :  ab  +  ss 

ss  =  bitxor_inexact(ab,ss,p,class(A),l:(na-l)); 

P(:)  =  bit  set (P (:),  2 , bitget  ( ss  ,  1 ))  ; 
ss  =  bitshif t (ss , -1) ; 

ss  =  bitand ( ss , 0x7FFF )  ;  %  0  at 

uppermost  bit 

for  i  =  3  :  nb 

bb  (  :  )  =  bitget (B (:),  i)  ; 

ab  =  bitand_inexact ( A  (  :  )  , OxFFFF^bb  ( :  )  , p , class ( A) )  ; 
if  sa 

ab  =  bitxor (ab , 0x8000 )  ;  %  NAND  at 

uppermost  bit 

end 

if  (i  ==  nb)  &&  sb 

ab  =  bitxor  (ab  ,  OxFFFF)  ;  %  flip  all  bits  (i.e 

.  I's  complement) 

end 

dd  =  bitxor_inexact(ab,ss,p,class(A),l:(na-l));  % 
full  adder:  ab+ss+cc 

ee  =  bitand_inexact(ab,ss,p,class(A),l:(na-l)); 
ff  =  bitand_inexact (dd , cc , p , class ( A) ) ; 
ss  =  bitxor_inexact (dd , cc , p , class ( A) ) ; 
cc  =  bitor_inexact (ee , f f ) P > class (A) ) ; 

P(:)  =  bitset (P (:), i ) bitget (ss  ,  1) )  ; 
ss  =  bitshif t (ss , -1) ; 

if  (i  <  nb)  II  (~sa) 


176 


122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 


end 

dd 
ee 
c  = 
c  (  : 

for 


end 


ss  =  bitand (ss , Dx7FFF) ;  % 

uppermost  bit 

else 

ss  =  bitor ( ss , 0x8000 ) ;  % 

uppermost  bit 

end 


bitxor_inexact (cc ,ss ,p, class (A)  ,2: (na-1) )  ; 
bitand_inexact (cc ,ss ,p, class (A)  ,2: (na-1) )  ; 
ones ( ' like  ’  , A)  ; 

=  sb  ; 

i  =  1  :  na 

d  =  bitget (dd , i ) ; 
e  =  bitget (ee , i) ; 
f  =  and_ inexact ( c , d , p) ; 
s  =  xor_ inexact ( c , d , p) ; 
c  =  or_inexact (e , f , p) ; 

P(:)  =  bit  set ( P  ( : )  , nb  +  i , s )  ; 


0  at 


1  at 
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Appendix  D.  Inexact  Floating-Point  Multiplier 


function  [  Sp  ,  Ep  ,  Mp  ]  =  ]yiultiplier_f  loating_inexact  (  Sa , 

Ea ,  Ma  ,  Sb  ,  Eb  ,  Mb  ,  f  mt  ,  p  ) 


switch  upper(fmt) 

case  'BINARY16' 
ne  =  5 ; 
nm  =  10; 
case  'BINARY32’ 
ne  =  8 ; 
nm  =  23; 
case  'BINARY64’ 
ne  =  11; 
nm  =  52 ; 

case  'BINARY128’ 


ne  =  15 ; 
nm  =  112; 
otherwise 

error  '  fmtuniustubeubinaryl6 
orubinaryl28 . ’ 

end 


ubinary32 


ubinary64 , u 


if  ~  exist  (’ p var  ’  ) 
P  =  1; 

end 


OxFF  =  cast (pow2a (ne ,  ’ uint64 ’ )  -  1,  'like’,  Ea)  ; 

OxYFFFFF  =  cast(pow2a(nm,’uint64’)  -  1,  ’like’,  Ma); 

7o  compute  sign  bit 

Spl  =  sign_logic  (Sa , Sb , p)  ; 

7o  compute  mantissa 

Mai  =  prepend_mantissa (  Ea ,  Ma ,  ne ,  nm ,  p  ); 

Mbl  =  prepend_mantissa (  Eb ,  Mb,  ne ,  nm ,  p  ); 

[Mpl ,  c]  =  multiply_mantissas  (Mai,  Mbl,  nm ,  p) ; 

7o  compute  exponent 

[Epl  ,  u ,  v]  =  add_exponent s  (  Ea ,  Eb  ,  ne  ,  c,  p  ); 


7o 

U_ 

underflow  and  overflow 
=  cast  (u  ,’ like ’,  Ea) 

* 

DxFF  ; 

u_ 

=  cast (u , 

’ like  ’ 

'  ,Ma) 

* 

0x7FFFFF  ; 

v_ 

=  cast ( V , 

’ like  ’ 

'  ,Ea) 

* 

OxFF  ; 

v_ 

=  cast ( V , 

’ like  ’ 

’  ,Ma) 

* 

□  x7FFFFF  ; 
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%  output  Inf  in  case  of  overflow 

Ep2  =  mux2_ inexact (v_ , Epl , DxFF , p , class ( Ep 1 ),  1 : ne )  ; 

]yip2  =  mux2_inexact  (v__  ,  Mpl  ,  zeros  (’ like Mpl )  ,p  ,  class  (Mpl ) 
, 1 : nm)  ; 

%  output  0  in  case  of  underflow 

Sp  =  niux2_  inexact  (u,Spl  , false  ,p,  ’logical'  ,1)  ; 

Ep  =  mux2_ inexact (u_,Ep2,zeros(’like’  ,Ep2)  ,p,class(Ep2)  ,1 
ne)  ; 

Mp  =  mux2_inexact (u__ , Mp2 , zeros (’ like Mp2 ), p , class  ( Mp2 ) 

, 1 : nm)  ; 


end 

function  Sp  =  sign_logic  (  Sa ,  Sb ,  p  ) 

Sp  =  xor_ inexact ( Sa , Sb , p ) ; 
end 

function  [Ep ,  u,  v]  =  add_exponent s  (  Ea ,  Eb ,  ne ,  c,  p  ) 

[~ ,  EO]  =  adder_inexact_PBL ( ’ RC ’ ,  ne ,  Ea ,  Eb ,  c,  p) ; 

V  =  and_ inexact ( bitget (EO , ne ), bitget (EO , ne +1 ), p ) ;  % 

overflow 

Ep  =  zeros ( size ( EO ),’ like Ea)  ; 

DxFFFF  =  bit cmpO ( zeros (’ like EO ), ne ) ; 
minusl27  =  zeros (’ like EO )  ; 

minusl27(:)  =  pow2(ne)  +  pow2(ne-l)  +  1;  7, 

bin2dec  (  ’  110000001  ’ ) 

EO  =  adder_inexact_PBL(’RC’,  ne+1,  EO,  minusl27,  [],  p); 
u  =  logical (bitget (EO , ne+1) ) ;  7 

underflow 

Ep ( : )  =  bitand (EO , OxFFFF) ; 
end 

function  Mai  =  prepend_mantissa (  Ea ,  Ma ,  ne ,  nm ,  p  ) 

7For  all  nonzero  Ea ,  a  1  is  prepended  to  the  mantissa  Ma . 
For  Ea  =  =  0  ,  the 

7mantissa  is  returned  unchanged. 

H  =  any_high_bits_inexact_PBL (Ea , ne , p) ; 
switch  class(Ma) 
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case  ’ uintS ’ 

if  nm  ==  8 

Ma  =  uintl6 (Ma)  ; 

end 

case  ' uint 16  ' 

if  nm  ==  16 

Ma  =  uint32 (Ma) ; 

end 

case  ’ uint32 ’ 

if  nm  ==  32 

Ma  =  uint64(Ma); 

end 

case  ' uint64 ’ 

if  nm  ==  64 

Ma  =  double (Ma); 

end 

end 

Mai  =  bitset (Ma , nm+1 , H) ; 
end 

function  [  Mp ,  c  ]  =  multiply_mantissas  (  Mai,  Mbl ,  nm ,  p 

) 


MpO  =  Multiplier_basic_inexact (  Mai  ,  Mbl ,  p  )  ; 
c  =  zeros ( size ( MpO ),  'like n MpO )  ; 

c(:)  =  bitget (MpO , 2* (nm+1) ) ;  %  carry¬ 

out  bit 

MpO  =  bit shif ter_ inexact _PBL (MpO int8 ( c ),  2* ( nm  +  1 ),  1 , p )  ; 

DxFFFF  =  bit cmpO ( zeros (’ like MpO ), nm - 1 ) ; 

IsbO  =  logical (bitget (MpO , nm+1) ) ; 
roundbitO  =  logical (bitget (MpO , nm) ) ; 

stickybitO  =  any_high_bits_inexact_PBL (bitand (MpO , OxFFFF)  , 
nm  - 1  ,  p )  ; 

MpO  =  cast (bitshift (MpO , -nm) like Mai ) ; 

MpO  =  adder _ inexact _PBL (’ RC ' ,  nm+2 ,  MpO,  zeros (’ like ', MpO ) 

i  .  .  • 

and (roundbitO  ,or(lsbO, stickybitO))  ,  p)  ; 

Mp  =  zeros ( size ( MpO ),’ like ’, Mai ) ; 

DxFFFF  =  bit cmpO ( zeros (’ like ', MpO ), nm ) ; 

Mp ( : )  =  bitand (MpO , OxFFFF ) ; 
c  =  logical (c) ; 
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Appendix  E.  Inexact  Matrix  Multiplier 


1  function  [  C ,  nc  ]  =  mtimes_inexact_PBL (  A,  B,  na ,  nb ,  p, 

bit  ) 

2  %Performs  matrix  multiplication,  similar  to  the  Matlab  * 

3  "/oCasterisk)  operator  or  mtimes  function,  except  inexact 

4  "/oadders  and  multipliers  are  used  in  the  process.  The 

input  s 

5  7oA  and  B  must  be  of  the  integer  classes  .  Inputs  A  and  B 

can 

6  "/ohave  more  than  two  dimensions  . 

7  % 

8  7o Inputs  : 

9  7o  A,  B:  (integer  arrays)  Matrices  to  be  multiplied. 

10  7oSize(A,2)  must  equal  size(B,l).  For  higher  dimensions, 

11  %size(A,3)  must  equal  size(B,3);  size(A,4)  must  equal 

12  %size(B,4)  etc. 

13  % 

14  7o  na :  (integer)  Number  of  bits  needed  to  store  the 

data 

15  7oin  A. 

16  7o  nb  :  (integer)  Number  of  bits  needed  to  store  the 

data 

17  7oin  B. 

18  7o  p:  (real  number)  Probability  of  correctness  of  each 

19  %binary  operation  within  the  inexact  adders  and 

multipliers . 

20  7o  bit:  (integer  or  integer  array)  The  highest  -  order 

bit 

21  7oWhich  can  be  inexact  within  the  addition  and 

multiplication 

22  7oOperations  .  This  can  either  be  a  scalar  ,  or  else  an 

array 

23  7oWith  dimensions  [size  (A  ,  1)  ,  size  (B  ,  2)  ,  size  (A  ,  3)  ,  size  (A  ,  4)  ] 

24  7oetc  . 

25  I 

26  7o Outputs  : 

27  % 

28  7o  C:  (integer  array)  Matrix  product  of  A  and  B.  Dimen 

29  7oSions  of  C  are  [size  (A  ,  1)  ,  size  (B  ,  2)  ,  size  (A  ,  3)  ,  size  (A  ,  4)  ] 

30  7oetc  .  C  may  be  of  a  different  integer  class  than  A  and  B. 

31  7o  nc  :  (integer)  Number  of  bits  needed  to  store  the 

data 

32  7oin  C.  nc  =  na  +  nb  +  ceil  ( log2  (  size  (A  ,  2)  )  )  . 

33 
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if  ~  exist  (’ bit var  ’  ) 
bit  =  Inf ; 

end 

if  isscalar (na) 

na  =  repmat (na  ,  [ size (A  ,  1)  ,  size (B , 2)  ]  )  ; 

end 

if  isscalar (nb) 

nb  =  repmat  (nb  ,[  size  (A  ,  1)  ,  size  (B  ,  2)  ])  ; 

end 

if  isscalar (bit ) 

bit  =  repmat  (bit  ,[  size  (A  ,  1)  ,  size  (B  ,  2)  ])  ; 

end 


switch  class (A) 

case  { ’ intS ’  ,  ’intl6’,  ’ int32 '  ,  ’ int64 ’ } 

sa  =  true ; 

case  {’uintS'j  ’uintl6’,  ’uint32',  'uint64'} 
sa  =  false ; 

end 

switch  class (B) 

case  { ’ intS ’  ,  ’intl6’,  ’ int32 '  ,  ' int64 ’ } 

sb  =  true ; 

case  {’uintB’,  ’uintl6’,  ’uint32’,  ’uint64’} 
sb  =  false ; 

end 

if  (sa  ~=  sb)  &&  ~  strcmp  (  class  (A)  ,  class  (B) ) 

error  ’ If uusingu s ignedu integer s , ubothumult ipl i cands u 
must ubeuof utheu same u class .  ' 

end 


nc  =  na  +  nb  +  ceil  ( log2  (  size  (A  ,  2)  )  )  ; 
nc (nc  >64)  =  64 ; 
ncmax  =  max(nc(:)); 
if  ncmax  <=  8 

classname  =  ’ int8 ’ ; 

elseif  ncmax  <=  16 

classname  =  ’intl6'; 
elseif  ncmax  <=  32 

classname  =  ’ int32  '  ; 

else 


classname  =  ’ int64 ' ; 
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end 


if  ~  ( sa  II  sb ) 

classname  =  regexprep(classname,’''int',’uint’); 

end 

if  (~ isscalar (A) )  &&  (~ isscalar (B) )  &&  (size(A,2)  ~=  size  ( 

B,l)  ) 

error  ’  InneruHiatrixudimensionsuniustuagree  .  ’ 

end 

C  =  zer os ( s ize ( A  ,  1 )  , size (B  ,  2)  , size ( A , 3)  , s ize ( A , 4)  , 
classname )  ; 

f or  r  =  1  :  size (A  ,  1) 

for  c  =  1  :  size(B,2) 

A_  =  permute (A (r  [2 , 1 , 3 , 4] )  ; 

Cc  =  Multiplier_basic_inexact(A_,  B(:,c,:,:),  na(r 
,c),  nb(r,c),  p,  bit(r,c)); 

OxFFFF  =  cast ( -1 like Cc ) ; 

OxFOOO  =  bit  shift ( DxFFFF , na (r , c ) +nb (r , c ))  ; 

OxOFFF  =  bit cmp ( OxFOOO  )  ; 
see  =  (Cc  <  0)  ; 

Cc(scc)  =  bitor ( Cc ( see ), OxFOOO  )  ; 

Cc(~scc)  =  bitand (Cc (~ see)  , OxOFFF)  ; 

C(r  ,c  ,  :  ,  : )  =  Cc  (1  ,  1  ,  :  ,  :)  ; 

for  i  =  2  :  size(Cc,l) 

nc_  =  na(r,c)  +  nb(r,c)  +  ceil ( log2 ( i ) ) ; 
C(r,c,:,:)  =  Adder_RCA_inexact(nc_,C(r,c,:,:), 
cast(Cc(i,l,:  ,:),’like’,C)  ,[]  ,p,l:min([nc_, 
bit (r , c) ]  )  )  ; 

end 

end 

end 
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Appendix  F.  Inexact  JPEG  Compression  Algorithm 


Main  Program 


%  Source  ; 

7o  http  :  /  /  WWW  .  impul  se  adventure  .  com/photo/jpeg- huffm  en¬ 
coding  .  html 


clear  ;  close  all 
q  =  100; 


for  f  =  1 
if  f  ==  1 
f  name 
elseif  f 
f  name 
elseif  f 
f  name 

else 

f  name 

end 


1 

’ F16  '  ; 

2 

’ Lena ’ ; 

3 

'Frisco ' ; 

' Mandrill ' ; 


ncomponents  =  1; 

,  1  for  grayscale 

A  =  imread([fname,’.tif’]); 


%  A  = 

A(263:342 ,247:326 , 

:) 

%  A  = 

A  (18 : 97 , 137 : 216  ,  :  ) 

> 

%  A  = 

A(522:681  ,357:516 , 

:) 

height 

=  size (A  ,  1)  ; 

width 

=  size (A, 2); 

R  =  A( 

:,:,!);  G  =  A(:  ,  :  , 

2) 

%  3  for  color 


y,  Lena 
%  Mandrill 
%  Frisco 


=  A(:  ,  :  ,3)  ; 


fprintf ( ’ Coloruspaceutransformation  .  .  . \n ' ) 

Yexact  =  YCbCr (R , G , B)  ; 

figure;  image ( repmat ( Yexact  ,[  1  ,  1  ,  3] ))  ;  title (' Originalu 
Image’);  axis  equal;  axis  tight;  axis  off 
[Y,  Cb ,  Cr]  =  YCbCr_inexact (R,  G,  B,  1); 
figure;  image (repmat (Y ,  [1  , 1  ,  3] ))  ;  title (’ ColoruSpaceu 
Transformation’);  axis  equal;  axis  tight;  axis  off 
uncompressed_size  =  numel(Y); 
npixels  =  numel(Y); 

Y1  =  tileSxS (Y) ; 

Y1  =  intS ( intie (Yl)  -  128); 

for  p  =  [0.99,0.999,0.9999,0.99999,0.999999] 
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39 
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58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 


fprintf ('Discreteucosineutransformation. . .\n’) 

Byexact  =  DCTO ( double (Y1 )) ; 

By  =  DCT_inexact_PBL  (Y1  ,  22 , p)  ; 

Qy  =  quantizeO (double (By) ,  q,  ’Y’); 

scandata  =  run_amp_huf  f  _all  (Qy,  []  ,  []  ,fname  ,q)  ; 

Aj  =  imread ( [fname  ,  ’  . jpg  ’  ] )  ; 
f igure 

if  ncomponents  >  1 
image ( A j ) 

else 

image(repmat(Aj  ,  [1,1,3])) 

err  =  double (Aj)  -  double ( Yexact ) ; 

end 

title  (  'FinaluJPEGuIniage  '  ) 

axis  equal ;  axis  tight  ;  axis  off 

compressed_size  =  numel ( scandata) ; 

compression_ratio  =  uncompressed_size  /  compressed_size ; 
bits_per_pixel  =  8  /  compression_ratio ; 
err_rms  =  sqrt(mean(err(:).~2)); 

snr_dB  =  10  *  loglO (( double (max (Yexact (:)) )  -  double(min( 
Yexact  (:))))  ... 

/  err_rms )  ; 

fprintf  ('pu  =  u“/o8.6f\n’  ,p) 

fprintf  ('Uncompressedusize:uu7oduutiytes\n’  , 
uncompressed_size) 

fprintf  (  ’  Compressedusize  luuuu'/oduubytesXn'  ,  compressed_size) 
fprintf  (  ’  Compress  ion  uratio  :  uu7o5 . 2f  \n  ’  ,  compression_  ratio  ) 
fprintf  ('BitSuperupixel:uuuuu7o4.2f\n’  ,bits_per_pixel) 
fprintf  i’  RMS  u  error  :uuuuuuuuuu7.7.3f\n’  ,  err_rms  ) 
f  printf  (  ’  SNR  :uuuuuuuuuuuuuuuu7o7.3fuudB\n',  snr_dB  ) 

fname2  =  sprintf  (  '7oS_p=7o8.6f_err=7o7.3f_comp=7o5.2f_snr=7o7.3 
f ’ , fname ,p,err_rms , compression _r at io , snr_dB ) ; 
fname2  =  regexprep ( f name2 ,  ’ \s+ ’  ,  ’  ’ )  ; 
fname2  =  regexprep(fname2,’\. 
movefile([fname ,  ’  .jpg’]  ,  [fname2 ,  ’  -  jpg’])  ; 

end 

end 
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6.2  Color  Space  Transformation 


6.2.1  Exact  Color  Space  Transformation. 

The  exact  color  space  transformation  is  needed  in  order  to  compnte  the  errors  of 

the  inexact  color  space  transformation. 

1  %  Color  space  transformation 

2 

3  %  Reference : 

4  %  http://www.jpeg.org/public/jfif.pdf 

5 

6  function  [  Y,  Cb ,  Cr  ]  =  YCbCr(  R,  G,  B,  approx  ) 

7 

8 


9 

if  ~  exist  ( 

’ approx ’ , ’ var ’ ) 

10 

approx 

=  ’ exact  '  ; 

11 

end 

12 

13 

switch  lower ( approx ) 

14 

case  ' 

uint8  ’ 

15 

R 

=  uintl6 (R) ; 

16 

G 

=  uintl6 (G) ; 

17 

B 

=  uintl6 (B) ; 

18 

Y 

=  uint8 (bitshif t  (154*R , -9) )  +  uint8 (bitshif t 

(151*G,-8))  +  uint8 (bitshif t (234*B , -11) )  ; 

19 

R 

=  int 16  (R)  ; 

20 

G 

=  intl6 (G)  ; 

21 

B 

=  intl6 (B)  ; 

22 

Cb 

=  int8 (bitshif t ( -86*R , -9) )  +  int8 ( bit shif t ( -84* 
G , -8) )  +  int8 (bitshif t (B  , -1)  )  ; 

23 

i 

=  (Cb  >=  0)  ; 

24 

Cb 

=  uint8 (bitset (Cb , 8 , 0) )  ; 

25 

Cb(i)  =  Cb(i)  +  uint8(128); 

26 

Cr 

=  int8 (bitshif t (R  , -1) )  +  int8 (bitshif t ( -107*G 
, -8)  )  +  int 8  ( bit shif  t  ( -83*B  ,  - 10)  )  ; 

27 

i 

=  (Cr  >=  0)  ; 

28 

Cr 

=  uint8 (bitset (Cr , 8 , 0) )  ; 

29 

Cr(i)  =  Cr(i)  +  uint8(128); 

30 

otherwise  %  exact  method 

31 

R 

=  double (R) ; 

32 

G 

=  double (G)  ; 

33 

B 

=  double (B) ; 

34 

Y 

=  0.299  *  R  +  0.587  *  G  +  0.114  *  B; 

35 

Cb 

=  -  0.16874  *  R  -  0.33126  *  G  +  0.5  *  B  +  128; 
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0.41869  *  G 


0.08131  *  B  +  128 


36 

Cr 

=  0.5  *  R  -  0 

37 

Y  = 

uint8 (Y)  ; 

38 

Cb 

=  uint8 ( Cb )  ; 

39 

Cr 

=  uint8 ( Cr )  ; 

40  end 
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6.2.2  Inexact  Color  Space  Transformation. 

1  %  Color  space  transformation 

2 

3  %  Reference  : 

4  %  http://www.jpeg.org/public/jfif.pdf 

5 

6  function  [  Y,  Cb ,  Cr  ]  =  YCbCr_inexact (  R,  G,  B,  p  ) 

7 

8 

9  if  ~ exist (’ p ^  var  O 

10  p  =  1; 

11  end 

12 

13  R  =  uintS (R) ; 

14  G  =  uintS (G) ; 

15  B  =  uintS (B) ; 

16  Y1  =  bit shif t ( Mult ipl ier_bas i c_ inexact (R , repmat ( uintS  ( 154) 

, size (R) )  ,S ,S ,p)  ,  -9)  ; 

17  °/o  always  <  77  (7  bits) 

18  Y1  =  uintS (bitand (Y1 , uintl6  (127) ))  ; 

%  lower  7  bits 

19  Y2  =  bit shif t ( Mult ipl ier_bas i c_ inexact (G , repmat ( uintS  ( 15 1 ) 

,  size (G)  )  ,S ,S ,p)  ,  -S)  ; 

20  °/o  always  <  151  (S  bits) 

21  Y2  =  UintS (bitand (Y2 , uintie  (255) ))  ; 

%  lower  S  bits 

22  Y3  =  bitshift (Multiplier_basic_inexact (B , repmat (uintS (234) 

, size (B)  )  ,S ,S ,p)  , -11)  ; 

23  °/o  always  <  30  (5  bits) 

24  Y3  =  uintS  (bitand  (Y3  ,  uintl6  (31)  ))  ; 

'/o  lower  5  bits 

25  Y  =  Adder_RCA_inexact(S,Yl,Y2,[],p); 

26  Y  =  Adder_RCA_inexact(S,Y,Y3,[],p); 

27  Cbl  =  bitshift (Multiplier_basic_inexact (R , repmat (uintS  (S6) 

, size (R) )  ,S ,7 ,p)  ,  -9)  ; 

28  °/o  always  <  43  (6  bits) 

29  Cbl  =  uintS (bitand (Cbl , uintl6  (63)  ))  ; 

7o  lower  6  bits 

30  Cb2  =  bitshift (Multiplier_basic_inexact (G , repmat (uintS (S4) 

, size (G) )  ,S ,7 ,p)  ,  -S)  ; 

31  °/o  always  <  S4  (7  bits) 

32  Cb2  =  uint  S  ( bit  and  ( Cb2  ,  uint  16  ( 127)  )  )  ; 

'/o  lower  7  bits 

33  Cb2  =  Adder_RCA_inexact(7,Cb2,Cbl,[],p,l:7,true);  % 

always  <=  127 
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34  Cb2  =  bitand(Cb2 ,uint8  (127) )  ; 

%  lower  7  bits 

35  Cb3  =  bitshif t (B , -1) ;  %  always  <=  127 

36  Cb3  =  bitor  (Cb3  , 128)  ;  %  add  128 

37  Cb  =  adder_subtractor_inexact_PBL ( ' RC ’  ,8 , Cb3 , Cb2 , true  ,  p)  ; 

38 

39  Crl  =  bitshif t (Multiplier_basic_inexact (B , repmat (uint8  (83) 

,size(B))  ,8,7,p)  ,-10)  ; 

40  °/o  always  <  21  (5  bits) 

41  Crl  =  uint8 (bitand (Crl , uintl6  (31) ) ) ; 

7o  lower  5  bits 

42  Cr2  =  bitshif t (Multiplier_basic_inexact (G , repmat (uint8 

(107)  ,size(G))  ,8,7,p)  , -8)  ; 

43  °/o  always  <  107  (7  bits) 

44  Cr2  =  uint  8  ( bit  and  ( Cr2  ,  uint  16  ( 127)  )  )  ; 

“/o  lower  7  bits 

45  Cr2  =  Adder_RCA_inexact(7,Cr2, Crl,  [],p, 1:7, true);  % 

always  <=  127 

46  Cr2  =  bitand (Cr2 , uint8  ( 127)  )  ; 

%  lower  7  bits 

47  Cr3  =  bitshif  t  (R  , -1)  ;  %  always  <=  127 

48  Cr3  =  bitor  (Cr3  , 128)  ;  %  add  128 

49  Cr  =  adder_subtractor_inexact_PBL ( ' RC ’  ,8 , Cr3 , Cr2 , true  ,  p)  ; 
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6.3 


Tiling  Function 


1  function  [  B  ]  =  tile8x8(  A  ) 

2 

3  7o  See  if  the  dimensions  of  A  are  divisible  by  8. 

pad  with  zeros . 

4  A  =  padarray(A,  mod(-size(A),8),  'post’); 

5 

6  7o  Break  A  up  into  8x8  tiles  .  Dimensions  :  yminor 

xminor ,  xma j  or 

7  B  =  reshape(A,  [8,  size(A,l)/8,  8,  size (A , 2) /8] ) ; 

8 

9  %  Rearrange  dimensions:  xminor,  yminor,  xmajor, 

10  B  =  permute (B,  [3,  1,  4,  2]); 


If  not  , 


,  yma j  or , 


yma j  or 
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6.4  Discrete  Cosine  Transformation  (DCT) 


6.4.1  Exact  DCT. 


The  exact  DCT  is  needed  in  order  to  compnte  the  errors  of  the  inexact  DCT. 


1  %  There  are  four  types  of  DCT  --  this  is  type  II  (DCT-II)  . 

2 

3  %  References : 

4  % 

5  7o  http://www.whydomath.org/node/wavIets/dct.html 

6  % 

7  7o  Rao  ,  K.  R.  and  P.  Yip.  Discrete  Cosine  Transform: 

Algorithms ,  Advantages , 

8  %  and  Applications.  Academic  Press,  San  Diego,  CA  ,  1990,  p 

37  . 


9 


10 

function  [ 

B  ]  =  DCT0(  A 

) 

11 

12 

U  =  1:2:15; 

13 

U  =  (  [0:7]  . 

C  *  U; 

14 

U  =  U  *  pi 

/  16; 

15 

U  =  0 . 5  *  cos (U) ; 

16 

U(1 , :)  =  0. 

25  *  sqrt (2) ; 

17 

18 

if  ndims (A) 

==  2 

19 

B  =  U  * 

A  *  (u.  ’) ; 

20 

else 

21 

B  =  zeros ( size (A) )  ; 

22 

f  or  i  = 

1  :  size(A,4) 

23 

for 

j  =  1  :  size  (A 

24 

B  ( :  ,  :  , j  , i)  = 

U 

25 

end 

26 

end 

27 

end 

3) 

*  squeeze (A  (:,:,  j  ,  i)  )  *  (U.’); 
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6.4.2  Inexact  DCT. 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 


%  There  are  four  types  of  DCT  --  this  is  type  II  (DCT-II) 

%  References : 

7o 

7o  http  :  /  / WWW  .  whydomath  .  org/node/wavlets/ dct  .  html 

% 

%  Rao ,  K.  R.  and  P.  Yip.  Discrete  Cosine  Transform: 
Algorithms  ,  Advantages  , 

%  and  Applications.  Academic  Press,  San  Diego,  CA ,  1990, 

p  37. 

function  [  B  ]  =  DCT_ inexact _PBL (  A,  nbits ,  p  ) 

7o  Maximum  number  of  bits  used  by  the  DCT  in  all  of  the 
following  images: 

7o  Mandrill  (4.2.03),  Lena  (4.2.04),  F16  (4.2.05),  and  San 

Francisco  (2.2.15): 


By Imax 
16; 


=  [17  17  17  17  17  17  17  17 
16  16  16  16  16  16  16  16; 


;  16  16  16  16  16  16  16 
16  16  16  16  15  15  15  16; 


15  15  15  15  15  15  15  15 
15  15  14  15  15  15  14  14 


15  15  15  15  15  15  15  15; 


15  15  14  14  15  14  14  14] 


17 

18 

19 

20 

21  By2max  =  [25  25  24  24  23  23  23  23;  25  24  24  23  23  23  23 

23;  ... 

22  24  24  23  23  23  22  23  22;  24  23  23  23  23  23  22  22; 


7o  for  U  *  A,  where  nu  =  7  and  na=8 


23 


23  23  23  22  23  22  22  22 


23  22  22  22  22  22  22  22; 


22  22  22  21  22  21  21  21] 


24  22  22  22  22  22  21  22  21 

25  7o  for  (U*A)*(U.  ’)  ,  where  nu  =  7 ,  na  =  8  and  nut=7 

26 

27  Bymax  =  [  1 1  1 1  10  10  9  9  9  9 ;  1 1  10  10  9  9  9  9  9 ;  ... 

28  10  10  9  9  9  8  9  8;  10  9  9  9  9  9  8  8;  ..  . 

29  99989888;98888888;  ... 

30  88888787;8887877  7]; 

31  7o  for  the  final  (U*A)*(U.  ’) 

32 

33  if  ~  exist  (’ nbits var  ' ) 

34  nbits  =  22; 
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35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 


end 


if  ~  exist  (’ p var  ’  ) 

P  =  1; 

end 

if  nbits  <  18 

na  =  floor((nbits  +  1)  /  3)  +  1; 
nu  =  floor(nbits  /  3); 
nut  =  nbits  -  na  -  nu ; 

else 

na  =  8 ; 

nu  =  ceil((nbits  -  na)  /  2) ; 
nut  =  nbits  -  na  -  nu ; 

end 

sa  =  na  -  8; 
su  =  nu ; 
sut  =  nut ; 

A  =  floor(A  *  pow2(sa)); 

U  =  1:2:15; 

U  =  ((0:7).’)  *  U; 

U  =  U  *  pi  /  16; 

U  =  0 . 5  *  cos (U)  ; 

U(1 , : )  =  0.25  *  sqrt (2) ; 

U1  =  round (pow2 ( su)  *  U) ; 

U2  =  round (pow2(sut)  *  U.’); 

if  nu  <  8 

U1  =  int8 (Ul)  ; 
elseif  nu  <  16 

Ul  =  intl6(Ul); 
elseif  nu  <  32 

Ul  =  int32 (Ul) ; 

else 

Ul  =  int64(Ul); 

end 

if  nut  <  8 

U2  =  int8 (U2) ; 
elseif  nut  <  16 

U2  =  intl6 (U2) ; 
elseif  nut  <  32 

U2  =  int32 (U2) ; 
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82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 
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101 
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103 

104 
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108 
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no 

111 


else 


U2  =  int64 (U2) ; 

end 

if  ~ ismatrix (A) 

U1  =  repmat (U1  ,  [1  , 1  ) size (A , 3)  , size (A , 4) ] )  ; 

U2  =  repmat (U2  ,  [1  , 1 ) size (A  ,  3)  ,  size (A , 4) ] )  ; 

end 

Bylmax  =  Bylmax  +  nu  +  na  -  15; 

[B,nb]  =  mtimes_inexact_PBL (U1 , A , nu+1 , na , p , Bylmax -3) ; 
Bylmax  =  repmat (Bylmax  ,  [1  , 1 > size (B  ,  3)  , size (B , 4) ])  ; 
sgn  =  (B  <  0)  ; 

B  =  bitset (B , Bylmax , sgn) ; 

B  =  correct_upperbits (B , Bylmax) ; 

B  =  bitshif t (B , -8) ; 

By2max  =  By2max  -  8; 

By2max  =  By2max  +  nut  -  7 ; 

B  =  mtimes_inexact_PBL (B , U2 , nb , nut , p , By2max -5) ; 

By2max  =  repmat (By2max  ,  [1 , 1 > size (B  ,  3)  , size (B , 4) ])  ; 
sgn  =  (B  <  0)  ; 

B  =  bitset (B , By2max , sgn) ; 

B  =  correct_upperbits (B , By2max) ; 

B  =  bitshif t (B , -sa-su-sut+8) ; 

Bymax  =  repmat (Bymax ,  [1 , 1 > size (B  ,  3)  , size (B , 4) ])  ; 
sgn  =  (B  <  0)  ; 

B  =  bitset (B , Bymax , sgn) ; 

B  =  correct_upperbits (B , Bymax) ; 
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6.5 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 


Quantization 


%  References : 

7o  http  :  /  /  WWW  .  ams  .  org/  samplings  /  fe  at  ur  e-col  umn/fc  arc-image- 
compression 

7o  http  :  /  / WWW  .  whydomath  .  org/node/wavlets  / quantization  .  html 
function  [  Q,  DQT  ]  =  quantizeO(  B,  q,  fun  ) 

if  q  <  1 

q  =  1; 

end 

if  nargin  <  3 

fun  =  ’ luminance ’ ; 

end 

switch  upper(fun) 

case  {’ CHROMINANCE CB CR ’ > 


[17 

18 

24 

47 

99 

99 

99 

99 

;  18 

21 

26 

66 

99 

99 

99 

99; 

24 

26 

56 

99 

99 

99 

99 

99; 

47 

66 

99 

99 

99 

99 

99 

99; 

99 

99 

99 

99 

99 

99 

99 

99; 

99 

99 

99 

99 

99 

99 

99 

99; 

99 

99 

99 

99 

99 

99 

99 

99; 

99 

99 

99 

99 

99 

99 

99 

e 

99] 

. 

•  > 

[16 

11 

10 

16 

24 

40 

51 

61 

;  12 

12 

14 

19 

26 

58 

60 

55; 

14 

13 

16 

24 

40 

57 

69 

56; 

14 

17 

22 

29 

51 

87 

80 

62; 

18 

22 

37 

56 

68 

109 

103  77;  24  35  55  64  8 

1  104 

113 

92 

>  • 

49 

64 

78 

87 

103 

12 

1  1 

20 

101 ; 

72 

92 

95 

98 

11 

2 

100  103  99]  .  ’  ; 

end 

if  q  <=  50 

alpha  =  50  /  q; 

else 

alpha  =  2  -  q  /  50; 

end 

if  q  >=  100 

Q  =  round  (B )  ; 
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36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 


DQT  =  ones ( size (Z) )  ; 


else 

DQT  =  round(alpha  *  Z) ; 
if  ~ ismatrix (B) 

Z  =  repmat (Z , [1 ,  1,  size(B,3),  size(B,4)]); 

end 

Q  =  round  (B  ./  (alpha  *  Z) )  ; 

end 

if  all ( ( abs ( Q  ( :  )  )  <=  2047)  &  (~ isnan (Q  ( :  )  )  )  ) 

Q  =  int 16 ( Q )  ; 

else 

warning! 'Quantizedudatauautuofurange . ’ ) 
Q((abs(Q) >2047)  |  isnan(Q))  =  0; 

Q  =  int 16 ( Q )  ; 

end 
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6.6 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 


Zigzag  Function 


%  Reference  : 

%  http : // WWW . ams . org/ samplings /fe at ur e-col umn/fc arc-image- 
compression 

function  [  Q1  ]  =  zigzag8x8 (  Q  ) 


[  1, 

2, 

6, 

7, 

15, 

16, 

28  , 

29 

3, 

5, 

8, 

14, 

17, 

27, 

30  , 

43 

4, 

9, 

13, 

18, 

26  , 

31  , 

42, 

44 

10, 

12, 

19, 

25  , 

32  , 

41  , 

45, 

54 

11  , 

20  , 

24, 

33  , 

40, 

46, 

53  , 

55 

21  , 

23  , 

34, 

39  , 

47, 

52  , 

56  , 

61 

22  , 

35  , 

38  , 

48, 

51, 

57, 

60  , 

62 

36  , 

37, 

49, 

50  , 

58  , 

59  , 

63  , 

64  ] 

Q1  =  zeros ( size  (Q)  ,  class(Q)); 

if  ismatrix(Q)  two-dimensional,  one  8x8 

tile  only 
Ql(zigzag)  =  Q; 

else  °/o  f  our  -  dimens  ional  ,  entire 

image 

[~,z2,z3]  =  ndgrid(l:64,  (0 : ( size (Q , 3) -1) ) *64 ,  (0:( 

size(Q,4) -l))*64*size(Q,3)) ; 
z2  =  reshape(z2,  size(Q)); 
z3  =  reshape  (z3  ,  size(Q)); 

zigzag  =  repmat (zigzag  ,  [1,  1,  size (Q  ,  3)  , size (Q  ,  4) ] )  + 

z2  +  z3 ; 

Ql(zigzag)  =  Q; 

end 
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Run- Amplitude  Encoding 


%  Run - ampl itude  encoding 
7o  References  : 

7o  http  :/  /  cnx.  org  /  content/mll096/latest/ 

7o  http  :  /  /  WWW  .  impul  se  adventure  .  com/photo/jpeg- huffm  en¬ 
coding  .  html 

function  [  jpg,  jpgstr  ]  =  run_anip_encodeO  (  Q1  ,  bitstrings 

) 

7o  bitstrings  must  be  sorted  in  order  by  code  number. 


if  numel ( Q1 )  >  1 


dc  =  false ; 

else 

dc  =  true ; 

end 

jpg 

=  cell (2 , numel  ( Q1 ) ) 

nz  = 

0; 

j  = 

0; 

for 

i  =  1  :  numel ( Q 1 ) 

if  Ql(i)  II  dc 

j  =  j  +  1; 

s  =  numel ( dec2bin ( abs ( Q 1 ( i )))) ; 

if  (16  *  nz  +  s  +  1)  <=  numel  (bitstrings ) 

jpg{l,j}  =  bitstrings {16  *  nz  +  s  +  1}; 

else 

jpg{l,j}  =  bitstrings {1} ; 
warning  ' Datauoutuof u^ange . ' 

end 

if  Ql(i)  >=  0 

jpg{2,j}  =  dec2bin (Q1 (i) ) ; 

else 

jpg{2,j}  =  de c2bin ( bit cmpO ( - Q 1 ( i ) , s ) , s ) ; 

end 

nz  =  0 ; 

else 

nz  =  nz  +  1 ; 

if  (i  ==  numel(Ql))  ||  ( ~ sum ( abs ( Q1 ( i : end ) ) ) ) 

j  =  j  +  1; 

jpg{l,j]-  =  bitstrings  {1} ;  7,  end-of-block  ( 

EOB) 

if  j  <  numel ( Q 1 ) 
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jPg(:  .  (j+1)  : end) 


=  []  ; 

end 

break 

elseif  nz  ==  16 

j  =  j  +  1; 

jpg{l,j}  =  bitstrings {241} ;  %  zero  run  length 

(ZRL)  (OxFO  +  1) 
nz  =  0 ; 

end 

end 

end 

if  nargout  >=  2 

jpgstr  =  charC’l’  *  ones([l, 16*numel (jpg)]  ,  ’uintS’))  ; 
k  =  1; 

for  m  =  1  :  numel(jpg) 

jpgstr (k : (k+numel (jpg{m}) -1) )  =  jpg{m}; 
k  =  k  +  numel ( jpg{m}) ; 

end 

jpgstr  =  jpgstr  (1 :  (k-1) ) ; 

end 
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Huffman  Encoding 


function  scandata  =  run_amp_huf f _all (  Qy ,  Qcb ,  Qcr ,  fname  , 

q  ) 

% Source  : 

7ohttp  :  /  /  WWW  .  impulse  adventure  .  com/ photo  /  jpeg -huffm  an -  coding 
.  html 

if  ~  exist  (’ Qy var  ' ) 

Qy  =  []  ; 

end 

if  ~  exist  (’ Qcb var  ’  ) 

Qcb  =  []  ; 

end 

if  ~  exist  (’ Qcr var  ’  ) 

Qcr  =  []  ; 

end 

[~ ,  fname]  =  f ilepart s ( fname ) ; 
fname  =  [fname  >  ’  ■ jpg  ’  ]  ; 

ncomponents  =  (~ isempty (Qy) )  +  (~ isempty (Qcb) )  +  ("isempty 
(Qcr) ) ; 

height  =  8  *  size(Qy,4); 
width  =  8  *  size(Qy,3); 

QO  =  zigzag8x8  (Qy ) ; 

load  huf f man_dc_luminance_sorted 
bitstrings_dc_luminance  =  bitstrings ; 
load  huf f man_ac_luminance_sorted 
bitstrings_ac_luminance  =  bitstrings ; 
load  huffm an _dc_chromin an ce_ sorted 
bitstrings_dc_chrominance  =  bitstrings ; 
load  huffm an _ac_chromin an ce_ sorted 
bitstrings_ac_chrominance  =  bitstrings ; 

fprintf ( ’ Run  -  amplitude  u encoding .  .  . \n ’ ) 

jpgstr  =  char(’l’  *  ones([l,32*numel(Qy)],’uint8’)); 

k  =  1; 

prev_Qy_dc  =  0; 
prev_Qcb_dc  =  0; 
prev_Qcr_dc  =  0; 
dc_correction  =  0; 
for  i  =  1  :  size(Qy,4) 
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f  printf  (  ’  7oiu  ’  ,  size  ( Qy  ,  4)  - i )  ; 
if  ~mod ( i  ,  10) 

fprintf ( ’\n’) ; 

end 

for  j  =  1  :  size(Qy,3) 

Q1  =  squeeze(Q0(:,:,j,i)); 

[~ ,  jpgstrO]  =  run_amp_encodeO ( Q1 ( 1 ) -prev_Qy_dc , 
bitstrings_dc_luminance ) ; 
j pgstr (k : (k+numel ( j pgstrO ) -1) )  =  jpgstrO; 
k  =  k  +  numel ( j pgstrO ) ; 

[~ ,  jpgstrO]  =  run_amp_encodeO (Q1 (2 :  end)  , 
bitstrings_ac_luminance )  ; 
j pgstr (k : (k+numel ( j pgstrO ) -1) )  =  jpgstrO; 
k  =  k  +  numel ( j pgstrO ) ; 
prev_Qy_dc  =  Ql(l); 

7„  Very  slight  error  in  the  dc  component  --  not 
sure  why  . 

7o  No  big  deal  for  small  images  ,  but  for  large 
images  it  accumulates 

prev_Qy_dc  =  prev_Qy_dc  -  floor ( dc_correct ion )  ; 
if  floor ( dc_correct ion )  >=  1 

dc_correction  =  0; 

else 

dc_correction  =  dc_correction  +  0.02; 

end 

if  ncomponents  >  1 

Q1  =  zigzagSxS ( squeeze ( Qcb j  , i )))  ; 

[~ ,  jpgstrO]  =  run_amp_encode0 ( Q 1 ( 1 ) - 

prev_Qcb_dc  ,  bitstrings_dc_chrominance )  ; 
j pgstr  (k :  (k  +  numel ( j pgstr 0 ) -1 ) )  =  jpgstrO; 
k  =  k  +  numel ( j pgstrO ) ; 

[~ ,  jpgstrO]  =  run_amp_encode0 (Q1 (2 : end) , 
bitstrings_ac_chrominance ) ; 
j pgstr  (k :  (k  +  numel ( j pgstrO ) -1 ) )  =  jpgstrO; 
k  =  k  +  numel ( j pgstrO ) ; 
prev_Qcb_dc  =  Ql(l); 

Q1  =  zigzag8x8(squeeze(Qcr(:,:,j,i))); 

[~ ,  jpgstrO]  =  run_amp_encode0 ( Q 1  ( 1 ) - 

prev_Qcr_dc ,  bitstrings_dc_chrominance ) ; 
j pgstr (k : (k+numel ( j pgstrO ) -1 ) )  =  jpgstrO; 
k  =  k  +  numel ( j pgstrO ) ; 

[~ ,  jpgstrO]  =  run_amp_encode0 (Q1 (2 :  end)  , 
bitstrings_ac_chrominance ) ; 
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j pgstr (k : ( k+numel ( j pgstrO ) -1 ) )  =  jpgstrO; 
k  =  k  +  numel ( j pgstrO ) ; 
prev_Qcr_dc  =  Ql(l); 

end 

end 

end 


jpgstr  = 
jpgstr  = 
jpgstr  = 
scandata 


jpgstr  (1 :  (k-1) )  ; 
stuffbyte (jpgstr) 
reshape  (  jpgstr ,  [8, 

=  bin2dec ( j pgstr ) ; 


numel (jpgstr)/ 8])  . 


filelD  =  f open ( f name w ’) ; 
fwritel_SOI (  f ilelD  ) ; 
f write2_APP0 (  filelD  ); 

[~  ,  DQT]  =  quantizeO(  zeros(8,8),  q,  luminance’  ); 
fwrite3_DQT(  filelD,  0,  DQT,  q,  ’luminance’  ); 
if  ncomponents  >  1 

[~ ,  DQT]  =  quantizeO(  zeros  (8,8),  q,  ’chrominance’  ); 
f write3_DQT (  filelD,  1,  DQT,  q,  ’chrominance’  ); 

end 

f write4_S0F0 (  filelD,  height,  width,  ncomponents  ); 

load ( ’ huffman_dc_ luminance ’ , ’ DHT ’ ) 

fwrite5_DHT(  filelD,  0,  0,  DHT  ); 

load ( ’ huf  fman_ac_ luminance ’  ,  ’ DHT  ’  ) 

fwrite5_DHT(  filelD,  0,  1,  DHT  ); 

if  ncomponents  >  1 

load(’huffman_dc_chrominance’  ,  ’DHT’) 
fwrite5_DHT(  filelD,  1,  0,  DHT  ); 
load(’huffman_ac_chrominance’  ,  ’DHT’) 
fwrite5_DHT(  filelD,  1,  1,  DHT  ); 

end 

f write6_S0S (  filelD,  ncomponents  ); 
fwrite (filelD , scandata) ; 

fwrite (filelD  ,  [255;  217]);  %  0xFFD9  (EOl) 

fclose(’all’) ; 
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Stuff  Byte 


%  Reference  : 

%  http : // WWW . impul se adventure . com/photo/jpeg- huffm an - 
coding . html 

function  [  jpgstr_stuf f ed  ]  =  stuffbyte(  jpgstr  ) 

%  Check  to  see  if  the  number  of  bits  is  a  multiple  of  8. 

7o  If  not  ,  then  pad  with  ones  . 

jpgstrl  =  char ( padarr ay ( uintS ( j pgstr ) ,  [0 , mod ( -numel ( 

jpgstr)  ,8) ]  ,  ... 

uint8(’l’),  ’post’)); 

°/„  Split  jpgstrl  up  into  8-bit  groups. 

jpgstr2  =  reshape (jpgstrl,  [8, numel (jpgstrl)/8])  .  ’; 

j  pgstr  _  stuf  f  ed  =  []  ; 

for  i  =  1  :  s ize  ( j pgstr2  ,  1 ) 

jpgstr_stuf f ed  =  [  jpgstr_stuf f ed ,  j pgstr 2 ( i  ,  : ) ]  ; 

if  str cmp ( j pgstr2 ( i ,  : )  ,  ’  1 1 1 1 1 1 1 1  ’  )  %  OxFF  becomes  0 

xFFOO  (stuff  byte) 

j pgstr _ stuf f ed  =  [  j pgstr_stuf f ed ,  ’00000000’]; 

end 

end 
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6.10  File  Operations 


1  function  [  count  ]  =  fwritel_SDI(  filelD  ) 

2 

3  %  Reference : 

4  %  http  :// vip  .  sugovica  .  hu/ Sardi /kepnezo  /  JPEG°/o20 File  °/o20 

Layout  °/o20  and7„20  Format  .  htm 

5 


6 

count  =  f write  ( f ilelD  , 

[255 ;  216] )  ; 

7o  OxFFDS  (SOI) 

1 

9 

function  [  count  ]  =  f write2_APP0  (  filelD  ) 

3 

7o  Reference  : 

4 

7o  http  :  //  vip 

. sugovica . 

hu/Sardi/kepnezo/ JPEG  7o20  File  7o20 

Layout  7o20  and  7o20  Format  .htm 

U 

6 

A  =  [255;  224;  .  .  . 

7o  OxFFEO  (APPO) 

7 

0;  16;  . 

7o  APPO  field  length 

8 

74;  70; 

73;  70;  0; 

...  7o  JFIF  identifier 

9 

1;  2;  .  . 

7o  version  1.02 

10 

1;  ... 

7o  units  =  l  means  dots  per 

inch 

(DPI) 

11 

0;  72;  . 

7o  X  density  (DPI) 

12 

0;  72;  . 

7o  Y  density  (DPI) 

13 

0;  ... 

7o  X  thumnail  width 

14 

0]  ; 

7o  Y  thumnail  height 

15 

16 

count  =  f write  ( f ilelD  , 

A)  ; 

1 

function  [  count ,  DQT 

]  =  fwrite3_DQT(  filelD,  tablelD , 

9 

DQT ,  q ,  fun  ) 

3 

7o  References 

4 

7o  http  :  /  /  WWW 

. ams . org/samplings/feature -  column/ f  care  - image 

compression 

5 

7o  http  :  /  /  WWW 

. whydomath 

. org/ node/ wav lets/ quantization . html 

6  7o  http://vip.sugovica.hu/Sardi/kepnezo/JPEG7o20File7o20 

Layout  7o20  and  7o20  Format  .htm 

7 

8  if  nargin  <  5 

9  fun  =  'luminance’; 

10  end 

11 

12  if  nargin  <  4 

13  q  =  100; 

14  end 
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if  numel (DQT ) 


64 


if  (q  <  1)  II  (q  >  100) 

error  ’ qumustubeubetweenuluandu 100 . ’ 

end 


switch  upper(fun) 

case  { ’C’ , ’CB ’ , ’CR’ , 'CHROMINANCE'} 

Z  =  [17  18  24  47  99  99  99  99;  18  21  26  66  99 

99  99  99;  ... 

24  26  56  99  99  99  99  99;  47  66  99  99  99  99 
99  99;  ... 

99  99  99  99  99  99  99  99;  99  99  99  99  99  99 
99  99;  ... 

99  99  99  99  99  99  99  99;  99  99  99  99  99  99 
99  99] ; 

otherwise  %  luminance 

Z  =  [16  11  10  16  24  40  51  61;  12  12  14  19  26 

58  60  55;  ... 


14 

13 

16  24 

40 

57  69  56; 

14  17  22  29  51  87 

80 

62  ; 

18 

22 

37  56 

68 

109 

103  77;  24  35  55  64  81 

104 

113 

92  ; 

49 

64 

78  87 

103 

121 

120 

101;  72  92  95  98 

112 

100 

103 

99]  ; 

end 

if  q  <=  50 

alpha  =  50  /  q; 

else 

alpha  =  2  -  q  /  50; 

end 

DQT  =  round(alpha  *  Z) ; 


end 


A  =  [255;  219; 

0;  67;  ... 

tablelD] ; 


7,  OxFFDB  (DQT) 

7  DQT  field  length 
7  destination  ID  number 


A  =  [A;  reshape(zigzag8x8(DQT),[64,l])]; 


count  =  f write (f ilelD ,  A); 
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function  [  count  ]  =  f write4_SDF0 (  filelD,  height,  width, 
ncomponents  ) 


7o  Reference  : 

7o  http  ://vip.  sugovica.hu/Sardi/kepnezo/JPEG7o20File7o 20 
Layout  7o20  and7o20  Format  .  htm 


A  =  [255;  192;  ...  %  OxFFCO  (SOFO) 

0;  3*ncomponents+8 ;  ...  7o  SOFO  field  length 

8;  ...  7o  data  precision 

floor (height /256) ;  mod(height  ,256) ;  . . .  %  image 

height 

floor  (width/256)  ;  mod  ( width  ,  256 )  ;  ...  7o  image 

width 

ncomponents];  7o  number  of  components 


for  i 
A 


end 


1  :  ncomponents 

[A ;  i  ;  ... 

17;  ... 

i~  =  l]  ; 

(0  or  1) 


7o  component  ID  number 
7o  0x11  sampling  factors 
7o  quantization  table  number 


count  =  f write  ( f ilelD  ,  A); 

function  [  count  ]  =  fwrite5_DHT(  filelD,  tablelD ,  acdc  , 

DHT  ) 


7o  Reference  : 

%  http  :  /  /  vip  .  sugovica  .  hu/Sardi/kepnezo/JPEG7o20File7o20 
Layout  7o20  and 7.20  Format  .  htm 

7o  DHT  must  be  sorted  in  order  by  code  length. 

7o  DHT{i}  =  vector  of  all  codes  of  length  i  bits. 

if  (tableID~=0)  &&  (tableID~=l)  kk  (tableID~=2)  kk  ( 
tablelD  ~=3) 

error  ’ tablelDumustubeuO , ul > u2 , uoruS .  ’ 

end 


switch  upper(acdc) 
case  ' DC  ’ 

acdc  =  0; 
case  ’ AC  ’ 

acdc  =  1 ; 
case  {0,1} 
otherwise 
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error  ’ acdc umust ubeuOu ( dc ) uor u lu ( ac ) . ’ 

end 

A  =  [255 ;  196 ;  ... 

0;  0;  ... 

calculated  below) 

16* acdc  +  tablelD  ;  ... 

zeros (16,1)]  ; 

for  i  =  1  :  numel(DHT) 

A(i+5)  =  numel (DHT{i}) ; 

end 

for  i  =  1  :  numel (DHT) 

A  =  [A;  DHT{i}(  :  ) ]  ; 

end 

A(4)  =  numel  (A)  -  2;  °/o  DHT  field  length 

count  =  f write  ( f ilelD  ,  A); 


7.  0xFFC4  (DHT) 

7  DHT  field  length  ( 

7  DHT  information  byte 


function  [  count  ]  =  fwrite6_SDS(  filelD,  ncomponents  ) 


4  Reference  : 

7  http : // vip . sugovica . hu/Sardi/kepnezo/JPEG720File720 
Layout  720  and 720  Format . htm 

if  (ncomponents ~=1) &&  (ncomponents ~=2) &&  (ncomponents ~=3) 
&&( ncomponents ~=4) 

error  ’ ncomponentsumustubeul , u2 , u3 , uoru4 .  ' 


end 

A  =  [255;  218;  .  .  . 

0;  2* ncomponent s +6 ; 
ncomponents] ; 

for  i  =  1  :  ncomponents 

A  =  [A;  i;  ... 
17*(i~=l) ] ; 
numbers 

end 

A  =  [A;  0;  63;  ... 

0]  ; 

count  =  f write  ( f ilelD  ,  A); 


7  OxFFDA  (SOS) 

7  SOS  field  length 
7  number  of  components 

7  component  ID  number 
7  DC  &  AC  Huffman  table  ID 


7  spectral  selection 
7  successive  approximation 
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Appendix  G.  Logical  Functions 


Inexact  NOT 

function  [  B  ]  =  not_inexact (  A,  p  ) 

"/oCalculates  the  logical  NOT  of  the  input  argument  A, 
similar  to  the 

%standard  not  function ,  except  that  each  bit  has  a  random 
error 

"/oprobability  equal  to  1-p. 

7o 

7o Inputs  : 

%A :  (logical  array)  Input  argument  for  the  NOT  operator, 

"/op :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  B . 

7o  0  <=  p  <=  1. 

% 

7o Outputs  : 

7oB :  (logical  array)  B  =  A  OR  B ,  subject  to  a  bitwise 

random  error 

7o  probability  1-p.  B  has  the  same  dimensions  as  A  and 
B  . 

% 

7oNotes  : 

7oIf  p  =  l  ,  then  B  =  ~A  and  is  error-free. 

%If  p=0,  then  B  =  A. 

7oIf  p  =  0.5,  then  B  contains  completely  random  data. 

7o 

7oRef  er ence  : 

7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 
Boolean  logic  and 

7oits  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 
Department  of 

^Computer  Science,  Jun  2008. 

B  =  ~A; 

err  =  ( r and  (  s ize  (B )  )  >  p)  ; 

B (err)  =  ~B (err)  ; 
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7.2  Inexact  AND 


1  function  [  C  ]  =  and_inexact (  A,  B,  p  ) 

2  %Calculates  the  bitwise  AND  of  the  input  arguments  A  and  B 

,  similar  to  the 

3  7oStandard  and  function  ,  except  that  each  bit  has  a  random 

error 

4  "/oprobability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  7oA ,  B:  (logical  arrays)  Input  arguments  for  the  AND 

operator . 

8  7o  B  must  have  the  same  dimensions  as  A. 

9  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  C . 

10  7o  0  <=  p  <=  1. 

11  7o 

12  7o Outputs  : 

13  %C :  (logical  array)  C  =  A  AND  B,  subject  to  a  bitwise 

random  error 

14  7o  probability  1-p.  C  has  the  same  dimensions  as  A  and 

B  . 

15  7o 

16  7oNotes  : 

17  7oIf  p  =  l  .  then  C  =  A  AND  B  and  is  error -free. 

18  %If  p=0,  then  C  =  A  NAND  B. 

19  %If  p=0.5,  then  C  contains  completely  random  data. 

20  % 

21  7oRef  er ence  : 

22  7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

23  7oits  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice  University, 

Department  of 

24  7oComputer  Science,  Jun  2008. 

25 

26  C  =  and (A,  B) ; 

27  err  =  ( r and  (  s ize  ( C )  )  >  p)  ; 

28  C(err)  =  ~C(err) ; 


210 


7.3  Inexact  OR 


1  function  [  C  ]  =  or_inexact (  A,  B,  p  ) 

2  "/oCalculates  the  bitwise  DR  of  the  input  arguments  A  and  B, 

similar  to  the 

3  7oStandard  or  function  ,  except  that  each  bit  has  a  random 

error 

4  "/oprobability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  7oA ,  B:  (logical  arrays)  Input  arguments  for  the  OR 

operator . 

8  7o  B  must  have  the  same  dimensions  as  A. 

9  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  C . 

10  7o  0  <=  p  <=  1. 

11  7o 

12  7o Outputs  : 

13  %C :  (logical  array)  C  =  A  OR  B ,  subject  to  a  bitwise 

random  error 

14  7o  probability  1-p.  C  has  the  same  dimensions  as  A  and 

B  . 

15  7o 

16  7oNotes  : 

17  7oIf  p  =  l  .  then  C  =  A  OR  B  and  is  error-free. 

18  %If  p=0,  then  C  =  A  NOR  B. 

19  %If  p=0.5,  then  C  contains  completely  random  data. 

20  % 

21  7oRef  er ence  : 

22  7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

23  7oits  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice  University, 

Department  of 

24  7oComputer  Science,  Jun  2008. 

25 

26  C  =  or (A  ,  B)  ; 

27  err  =  (rand  (  size  (C)  )  >  p)  ; 

28  C(err)  =  ~C(err) ; 


211 


7.4 


Inexact  XOR 


1  function  [  C  ]  =  xor_inexact (  A,  B,  p  ) 

2  %Calculates  the  bitwise  XOR  of  the  input  arguments  A  and  B 

,  similar  to  the 

3  7oStandard  xor  function  ,  except  that  each  bit  has  a  random 

error 

4  "/oprobability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  7oA ,  B:  (logical  arrays)  Input  arguments  for  the  XOR 

operator . 

8  7o  B  must  have  the  same  dimensions  as  A. 

9  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  C . 

10  7o  0  <=  p  <=  1. 

11  7o 

12  7o Outputs  : 

13  %C :  (logical  array)  C  =  A  XOR  B,  subject  to  a  bitwise 

random  error 

14  7o  probability  1-p.  C  has  the  same  dimensions  as  A  and 

B  . 

15  7o 

16  7oNotes  : 

17  7oIf  p  =  l  .  then  C  =  A  XOR  B  and  is  error-free. 

18  %If  p=0,  then  C  =  A  XNOR  B. 

19  %If  p=0.5,  then  C  contains  completely  random  data. 

20  % 

21  7oRef  er ence  : 

22  7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

23  7oits  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice  University, 

Department  of 

24  7oComputer  Science,  Jun  2008. 

25 

26  C  =  xor (A ,  B)  ; 

27  err  =  ( r and  (  s ize  ( C )  )  >  p)  ; 

28  C(err)  =  ~C(err) ; 
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7.5  Inexact  Multiplexer 


1  function  [  Z ,  Z_  ]  =  mux2_inexact (  S,  AO,  A1 ,  p,  classname 

,  bit  ) 

2  %mux2_inexact :  Two-input  multiplexer.  Computes  the 

bitwise 

3  7o((~S  AND  AO)  OR  (S  AND  Al))  ,  similar  to  the  bitand  func- 

4  "/otion  ,  except  that  each  AND,  OR,  and  NOT  gate  has  a  random 

5  %error  probability  equal  to  1-p. 

6  % 

7  %  Inputs : 

8  "/oS  :  (nonnegative  integer  array)  Selector. 

9  "/oAO  ,  Al  :  (nonnegative  integer  arrays)  Input  signals  for 

the 

10  7o  multiplexer.  S,  AO,  and  Al  must  all  have  the  same  di 

11  7o  mensions  . 

12  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within 

13  7o  the  output  Z .  0  <=  p  <=  1 . 

14  7oClassname  :  (string)  The  class  name  of  the  output  arrays 

Z 

15  7o  and  Z_  . 

16  7obit  :  (integer  vector)  Which  bit  positions  can  be  inexact 

17  7o  Position  1  is  the  lowest-order  bit.  (optional)  If 

bit 

18  %  is  omitted,  then  all  positions  can  be  inexact. 

19  7o 

20  7o Outputs  : 

21  7oZ:  (integer  array)  Z  =  ((~S  AND  AO)  OR  (S  AND  Al)),  sub 

22  7o  ject  to  a  bitwise  random  error  probability  1-p.  Z 

has 

23  7o  the  same  dimensions  as  S,  AO,  and  Al  . 

24  %Z_:  (integer  array)  Z_  =  ((S  AND  AO)  OR  (~S  AND  Al)), 

sub  - 

25  7o  ject  to  a  bitwise  random  error  probability  1-p.  Z_ 

has 

26  7o  the  same  dimensions  as  S,  AO,  and  Al  . 

27  % 

28  7oNotes  : 

29  7oIf  p  =  l,  then  Z  =  ((~S  AND  AO)  OR  (S  AND  Al))  and  is 

30  7o  error-free. 

31  %If  p=0,  then  Z  =  ((~~S  NAND  AO)  NOR  (S  NAND  Al)). 

32  7oIf  p  =  0.5,  then  Z  contains  completely  random  data. 
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33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 


7o 

"/oRef  er ence  : 

"/oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 
%Boolean  logic  and  its  meaning,"  Tech.  Rep.  TR-08-05 ,  Rice 
7oUniversity  ,  Department  of  Computer  Science,  Jun  2008. 

if  ~  exist  (’ p var  ’  ) 

P  =  1; 

end 

if  ~ exist (’ classname var ’ ) 

classname  =  class (bitor (bitand (bitcmp (S  (1) ),  AO  (1) )  , 
bitandCS  (1)  ,  A1  (1) ) ) )  ; 

end 

switch  classname 


case 

) 

logical ’ 

n 

=  1; 

case 

{ 

’ uint8 ' , 

> int8  '  > 

n 

=  8; 

case 

{ 

’ uintie ’ 

,  'intl6’} 

n 

=  16; 

case 

{ 

’ uint32 ’ 

,  ’int32’} 

n 

=  32; 

case 

{ 

’ uint64 ’ 

,  'int64’} 

n 

=  64; 

otherwise 

error  ’  c  1  as snameuHiustubeu logical  ,uuint8  ,uuintl6  ,u 
uint32 ,uuint64 ,uint8 ,uintl6 ,uint32 ,uoruint64. ’ 

end 

if  ~ exist (’ bit var  ’  ) 
bit  =  1  :  n ; 

end 

if  islogical (S) 

S_  =  not_inexact  (S ,  p)  ; 

ZO  =  and_inexact (S_  ,  AO ,  p)  ; 

Z1  =  and_inexact (S ,  A1 ,  p) ; 

Z  =  or_inexact  (ZO  ,  Z1 ,  p)  ; 

Z2  =  and_inexact (S ,  AO,  p) ; 

Z3  =  and_inexact (S_  ,  A1 ,  p)  ; 

Z_  =  or _ inexact ( Z2 ,  Z3  ,  p)  ; 

else 
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78 

79 

80 

81 

82 

83 

84 

85 

86 


S_  =  bitcmp_inexact (S ,  n,  p,  classname ,  bit); 


ZO  =  bitand_inexact (S_ ,  AO,  p,  classname 
Z1  =  bitand_inexact (S ,  A1  ,  p,  classname, 
Z  =  bitor_inexact  (ZO ,  Z1 ,  p,  classname, 

Z2  =  bitand_inexact (S ,  AO,  p,  classname, 
Z3  =  bitand_inexact (S_ ,  A1 ,  p,  classname 
Z_  =  bitor_inexact (Z2 ,  Z3 ,  p,  classname, 

end 


,  bit )  ; 

bit )  ; 
bit )  ; 

bit )  ; 

,  bit )  ; 
bit )  ; 
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7.6  Inexact  AND-OR-2-1 


1  function  [  D  ]  =  AD21_inexact (  A,  B,  C,  p,  classname ,  bit 

) 

2  "/oCalculates  the  bitwise  (A  OR  (B  AND  C))  ,  similar  to  the 

bitand  function  , 

3  "/oOxcept  that  each  bit  has  a  random  error  probability  equal 

to  1 -p . 

4  7o 

5  %  Inputs : 

6  7oA ,  B,  C:  (nonnegative  integer  arrays)  Input  arguments 

for  the  AOI 

7  %  function.  A,  B,  and  C  must  all  have  the  same 

dimensions  . 


8  7op :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  D . 

9  7  0  <=  p  <=  1. 


10 

7classname :  (string)  The 

class  name  of  the 

output 

array  D 

11 

7bit :  (integer  vector)  Which  bit  positions 

can  be 

inexact 

Position  1  is 

12 

7  lowest -order  bit.  (opt 

ional)  If  bit  is  omitted. 

then 

all  positions  can 

13 

7  be  inexact . 

14 

7 

15 

7 Outputs : 

16 

7D :  (nonnegative  integer 

array)  D  =  (A 

OR 

(B  AND 

C)  )  , 

subject  to  a 

17 

7  bitwise  random  error 

probability  1-p 

D  has  the  same 

dimensions  as  A, 

18 

7  B,  and  C. 

19 

7 

20 

7Notes : 

21 

7lf  p=l,  then  D  =  (A  OR  (B 

AND  C) )  and  is 

error -free  . 

22 

7lf  p=0,  then  D  =  (A  NOR  (B  AND  C) )  (and 

-or 

-invert ) . 

23 

7lf  p=0.5,  then  D  contains 

completely  random 

data  . 

24 

7 

25 

7Ref  er ence : 

26 

7L .  N.  B.  Chakrapani  and  K 

.  V.  Palem,  "A 

probabilistic 

Boolean  logic  and 

27 

7its  meaning,"  Tech.  Rep. 

TR -08 -05  ,  Rice 

University 

9 

Department  of 

28 

7Computer  Science,  Jun  2008. 

29 

30 

DO  =  bitand(B,  C)  ; 

31 

if  nargin  <  5 
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32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 


D  =  bitorCA,  DO); 
classname  =  class(D); 

else 

D  =  zeros  (  size  (A)  ,  classname); 

D(:)  =  bitor(A,  DO); 

end 

if  nargin  >=  6 

err  =  biterrors ( size  (D)  ,  p,  classname,  bit); 

else 

err  =  biterrors ( size  (D)  ,  p,  classname); 

end 

D  =  bitxor  (D  ,  err); 
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Inexact  AND-OR-AND-OR-2-1-1-1 


function  [  F  ]  =  A0AD21 1 l_inexact (  A,  B,  C,  D,  E,  p, 
classname ,  bit  ) 

7oAOAD21 1  l_inexact  :  Computes  the  bitwise  (A  OR  (B  AND  (C 
OR  (D  AND  E)  )  )  )  )  , 

"/oSimilar  to  the  bitand  function  ,  except  that  each  bit  has 
a  random  error 
7o probability  equal  to  1-p. 

FO  =  bitand(D,  E) ; 

FI  =  bitor  (C  ,  FO)  ; 

F2  =  bitandCB,  FI) ; 
if  nargin  <  7 

F  =  bitor(A,  F2); 
classname  =  class(F); 

else 

F  =  zeros  (  size  (A)  ,  classname); 

F ( :  )  =  bitor (A ,  F2)  ; 

end 

if  nargin  >=  8 

err  =  biterrors ( size  (F)  ,  p,  classname,  bit); 

else 

err  =  biterrors ( size (F)  ,  p,  classname); 

end 

F  =  bitxor(F,  err); 
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7.8 
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Inexact  n-Input  AND 


function  [  C  ]  =  and3_inexact (  A,  p  ) 

7o Performs  the  logical  AND  of  all  elements  along  the  rows 
of 

°/„a.  two-dimensional  array  A.  The  output  is  a  column  vector 

7oThis  function  simulates  an  inexact  N-input  AND  gate  ( 
where 

%N  is  the  number  of  columns  of  A) ,  by  ANDing  the  inputs 
7opairwise  in  a  binary  tree  of  2-input  inexact  AND  gates. 
7oEach  2-input  AND  gate  has  a  probability  of  correctness  p 
7oand  a  probability  of  error  1-p. 

% 

7oNote  that  since  it  is  a  binary  tree  structure  ,  if  there 
is 

7oan  odd  number  of  inputs  (that  is  ,  if  N  is  not  a  power  of 
2) 

7othen  the  rightmost  columns  of  A  are  evaluated  last  . 

There  - 

7ofore  ,  the  rightmost  columns  suffer  from  less  inexactness 
7othan  the  rest  of  the  array. 

7o 

7o Inputs  : 

7oA :  (2-dimensional  logical  array)  Input  data. 

%p :  (scalar)  Probability  of  correctness  of  each  2-input 

AND 

7o  gs^te  within  the  binary  tree  .  (0  <=  p  <=  1) 

I 

7o Output  : 

%C :  (column  vector  of  logicals)  Approximate  N-input  AND 

of 

7o  each  row  of  A. 

if  ~ exist (’ p var  ’  ) 

P  =  1; 

end 

for  i  =  1  :  ceil  ( log2  (  size  (A  ,  2)  )  ) 

f or  j  =  1  :  size (A , 2) 

if  j  <  size  (A  ,  2) 

A  =  [A(:,l:(j-1)),  and_inexact(A(:,j),A(:,j+l) 

,p)  ,  A ( :  , ( j  +2)  : end) ]  ; 

else 

break 

end 
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end 


36 

37  end 

38 

39  C  =  A; 
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7.9 
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Inexact  n-Input  OR 


function  [  C  ]  =  or3_inexact (  A,  p  ) 

% Performs  the  logical  OR  of  all  elements  along  the  rows  of 
%a  two-dimensional  array  A.  The  output  is  a  column  vector 

"/oThis  function  simulates  an  inexact  N-input  OR  gate  (where 
"/oN  is  the  number  of  columns  of  A)  ,  by  ORing  the  inputs 
%pairwise  in  a  binary  tree  of  2-input  inexact  OR  gates. 
7oEach  2-input  OR  gate  has  a  probability  of  correctness  p 
"/oand  a  probability  of  error  1-p. 

7o 

7oNote  that  since  it  is  a  binary  tree  structure  ,  if  there 
is 

%an  odd  number  of  inputs  (that  is  ,  if  N  is  not  a  power  of 
2) 

7othen  the  rightmost  columns  of  A  are  evaluated  last  . 

There  - 

%fore  ,  the  rightmost  columns  suffer  from  less  inexactness 
7othan  the  rest  of  the  array. 

7o 

%  Inputs : 

7oA :  (2-dimensional  logical  array)  Input  data. 

7oP :  (scalar)  Probability  of  correctness  of  each  2-input 

OR 

7o  within  the  binary  tree  .  (0  <=  p  <=  1) 

7o 

% Output  : 

7oC :  (column  vector  of  logicals)  Approximate  N-input  OR  of 

7o  each  row  of  A. 

if  ~ exist (’ p var  ’  ) 
p  =  1; 

end 

for  i  =  1  :  ceil  ( log2  (  size  (A  ,  2)  )  ) 

f or  j  =  1  :  size (A , 2) 

if  j  <  size  (A  ,  2) 

A  =  [A ( :  , 1 : ( j -1) )  ,  or_inexact (A  ( :  , j )  ,  A  ( :  , j +1)  , 
p)  ,  A ( :  , ( j  +2)  : end) ]  ; 

else 

break 

end 

end 

end 
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39  C  =  A; 
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Appendix  H.  Bitwise  Functions 


iV-Bit  One’s  Complement  (Exact) 

function  [  cmp  ]  =  bitcmp0(  A,  N  ) 

"/oReturns  an  N-bit  complement  of  the  input  A. 

7oSame  as  bitcmp(A,N)  which  is  deprecated. 

switch  class (A) 

case  { ’ intS ’ C uintS ’ } 
nmax  =  8 ; 

case  { ’ int 16 '  ,  ’ uint 16  ’  } 
nmax  =  16; 

case  { ’ int32 ' , ’ uint32 
nmax  =  32; 

case  { ’ int64 ' , ’ uint64 ' } 
nmax  =  64; 

case  {’ double single ’ } 

if  any(A  <  0)  II  any(A  ~=  floor(A)) 
c  =  class (A)  ; 
c  (1)  =  upper (c  (1)  )  ; 

error([c,  ’  u  input  SuHiustu  be  uHonnegativeu  integers 
.  ’]) 

elseif  any(A  >  intmax ( ’ uint64 ’ ) ) 

error  ’ValuesuinuAushouldunotuhaveu"on"ubitSu 
inupositionsugreateruthauuN .  ’ 

end 

A  =  uint64 ( A) ; 
nmax  =  64; 
case  ' logical ’ 

A  =  uintS ( A) ; 
nmax  =  1 ; 
otherwise 

error  ’ DperandsutOubitcmpOumustubeunumeric . ’ 

end 

if  ~ exist (’ N var  ’  ) 

N  =  nmax ; 

end 

if  any(N  <  0)  | |  any(N  >  nmax)  | |  any(N  ~=  floor(N)) 

error  ’  Number u of  ubit s uniust ubeuanu integer uwithiuu the u 
range uof uthe u input uA .  ' 

end 

if  N  <  64 
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40  OxFFFF  =  cast (pow2a (N , ’ uint64 O  -  1,  ’like’.  A); 

41  elseif  intmin ( class ( A) )  <  0 

42  OxFFFF  =  intmin  (’ iiit64 ’)  ; 

43  else 

44  OxFFFF  =  intmax ( ’ uint64 ’ ) ; 

45  end 

46 

47  cmp  =  bitxor(A,  OxFFFF); 
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Majority  Function  (Exact) 


function  [  D  ]  =  majority!  A,  B,  C  ) 

7omajority:  Computes  the  bitwise  majority  of  A, 

7oSimilar  to  the  bitand  function. 

DO  =  bitand (bitand  (bitcmp  (A)  ,  B)  ,  C)  ; 

D1  =  bitand (bitand (A , bitcmp (B) ), C) ; 

D2  =  bitand (bitand (A , B)  ,  bitcmp  (C) )  ; 

D3  =  bitand (bitand (A , B)  ,  C)  ; 

D  =  bit or ( bit or ( bit or (DO , D1 )  , D2 )  ,  D3 )  ; 

end 


B ,  and  C  , 
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8.3  Bitwise  Error  Generator 


1 
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This  function  is  used  by  many  higher  functions  to  simulate  unreliable  computation. 

function  [  err  ]  =  biterrors (  outputsize ,  p,  classname , 

bit  ) 

^Generates  an  array  of  random  integers .  The  numbers  are 
generated  bitwise 

%such  that  p  is  the  probability  that  each  bit  is  0,  and  1- 
p  is  the 

"/oprobability  that  each  bit  1. 

% 

%  Inputs : 

7oOutputsize  :  (vector)  The  dimensions  of  the  output  array 

err  . 

7oP :  (scalar)  The  probability  that  each  output  bit  is  a 

zero .  0  <=  p  <=  1 

7oClassname  :  (string)  The  class  name  of  the  output  array 

err  . 

7obit  :  (integer  vector)  Which  bit  positions  can  be  nonzero 

Position  1  is 

7o  lowest-order  bit.  (optional)  If  bit  is  omitted,  then 
all  positions  can 
7o  be  nonzero  . 

% 

7oOuptut  : 

7oerr  :  (integer  array)  Random  array  of  dimensions 

specified  by  outputsize. 

switch  classname 

case  {’uintS',  ’intS’} 
n  =  8 ; 

case  {’uintl6’,  ’intl6'} 
n  =  16; 

case  {’uint32’,  'int32’} 

n  =  32 ; 

case  {’uint64’,  'int64’} 

n  =  64; 
otherwise 

error  ’  classnameuniustubeuuintS  ,uuintl6  ,uuint32  ,u 
uint64 ,uint8 ,uintl6 ,uint32 ,uoruint64. ' 

end 

err  =  zeros ( outputsize ,  classname); 

errO  =  (  rand  (  [numel  ( err )  ,  n]  )  >  p  )  ;  7o  generate 

random  binary  digits 
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twos  =  pow2 (n-1 : -1 :  0)  ; 

err(:)  =  sum(err0  .*  twos(ones(  numel  (err),l),:),2);  °k 
convert  binary  to  dec 

if  nargin  >=  4 

bit  =  bit((bit  >=  1)  &  (bit  <=  n)); 
b  =  zeros ( classname ) ; 

b(:)  =  sum (bitset ( zeros ( classname )  ,  bit  (  : ) ) )  ; 
err  =  bitand(err,  b) ; 

end 
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8.4  TV-Bit  One’s  Complement  (Inexact) 


1  function  [  C  ]  =  bitcmp_inexact (  A,  n,  p,  classname  ,  bit  ) 

2  "/oCalculates  the  n-bit  complement  of  the  input  argument  A, 

similar  to  the 

3  7oStandard  bitcmp  function  ,  except  that  each  bit  has  a 

random  error 

4  "/oprobability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  "/oA :  (nonnegative  integer  array)  Input  argument  for  the 

bitcmp  function. 

8  7on :  (integer)  Number  of  bits  to  complement.  The  lowest  n 

bits  are  comple- 

9  7o  mented.  (optional)  If  n  is  omitted,  then  all  bits 

are  complemented . 

10  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  C . 

11  7o  0  <=  p  <=  1. 

12  7oClassname  :  (string)  The  class  name  of  the  output  array  C 

13  7obit  :  (integer  vector)  Which  bit  positions  can  be  inexact 

Position  1  is 

14  7o  lowest-order  bit.  (optional)  If  bit  is  omitted,  then 

all  positions  can 

15  %  be  inexact . 

16  % 

17  7o Outputs  : 

18  7oC :  (nonnegative  integer  array)  The  bitwise  complement  of 

A,  subject  to  a 

19  7o  bitwise  random  error  probability  1-p.  C  has  the  same 

dimensions  as  A. 

20  7o 

21  7oNotes  : 

22  %If  P=1 >  then  C  =  bitcmp (A)  and  is  error-free. 

23  %If  p=0,  then  C  =  A. 

24  7oIf  p  =  0.5,  then  C  contains  completely  random  data. 

25  7o 

26  7oRef  er ence  : 

27  7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

28  7oits  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 

Department  of 

29  7oComputer  Science,  Jun  2008. 

30 

31  if  ~  exist  (’ classname var  ’  ) 
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classname  =  class(A); 


end 

if  exist (' n var  ’  ) 

C  =  bitcmpO(A,  n) ; 

else 

C  =  bitcmpO (A) ; 

end 

if  exist (' bit var O 

err  =  biterrors ( size (C)  ,  p,  classname, 

else 

err  =  biterrors ( size (C) ,  p,  classname) 

end 

C  =  bitxor(C,  err); 


bit )  ; 
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8.5  Inexact  Bitwise  AND 


1  function  [  C  ]  =  bitand_inexact (  A,  B,  p,  classname ,  bit  ) 

2  yobitand_inexact  :  Calculates  the  bitwise  AND  of  the  input 

arguments  A  and 

3  %B ,  similar  to  the  standard  bitand  function,  except  that 

each  bit  has  a 

4  yrandom  error  probability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  7oA ,  B:  (nonnegative  integer  arrays)  Input  arguments  for 

the  AND  operator . 

8  7o  B  must  have  the  same  dimensions  as  A. 

9  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  C . 

10  7o  0  <=  p  <=  1. 

11  7oClassname  ;  (string)  The  class  name  of  the  output  array  C 

12  %bit  :  (integer  vector)  Which  bit  positions  can  be  inexact 

Position  1  is 

13  7o  lowest-order  bit.  (optional)  If  bit  is  omitted,  then 

all  positions  can 

14  7o  be  inexact  . 

15  I 

16  7o Outputs  : 

17  7oC :  (nonnegative  integer  array)  C  =  A  AND  B,  subject  to 

a  bitwise  random 

18  7o  error  probability  1-p.  C  has  the  same  dimensions  as 

A  and  B . 

19  7o 

20  7oNotes  : 

21  7oIf  P  =  1  .  then  C  =  A  AND  B  and  is  error-free. 

22  7oIf  p  =  0,  then  C  =  A  NAND  B. 

23  7oIf  p  =  0.5,  then  C  contains  completely  random  data. 

24  7o 

25  7oR.ef  er ence  : 

26  7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

27  7oits  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice  University, 

Department  of 

28  %Computer  Science,  Jun  2008. 

29 

30  C  =  bitand(A,  B) ; 

31 

32  if  exist (' classname var ’ ) 

33  C  =  cast (C , classname ) ; 
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else 


classname  =  class  (C)  ; 

end 

if  exist (' bit var  ' ) 

err  =  biterrors ( size (C)  ,  p,  classname, 

else 

err  =  biterrors ( size (C) ,  p,  classname) 

end 

C  =  bitxor(C,  err); 


bit )  ; 
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8.6  Inexact  Bitwise  OR 


1  function  [  C  ]  =  bitor_inexact (  A,  B,  p,  classname ,  bit  ) 

2  "/oCalculates  the  bitwise  DR  of  the  input  arguments  A  and  B, 

similar  to  the 

3  7oStandard  bitor  function  ,  except  that  each  bit  has  a 

random  error 

4  "/oprobability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  7oA ,  B:  (nonnegative  integer  arrays)  Input  arguments  for 

the  OR  operator . 

8  7o  B  must  have  the  same  dimensions  as  A. 

9  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  C . 

10  7o  0  <=  p  <=  1. 

11  7oClassname  ;  (string)  The  class  name  of  the  output  array  C 

12  %bit  :  (integer  vector)  Which  bit  positions  can  be  inexact 

Position  1  is 

13  7o  lowest-order  bit.  (optional)  If  bit  is  omitted,  then 

all  positions  can 

14  7o  be  inexact  . 

15  I 

16  7o Outputs  : 

17  7oC :  (nonnegative  integer  array)  C  =  A  OR  B ,  subject  to  a 

bitwise  random 

18  7o  error  probability  1-p.  C  has  the  same  dimensions  as 

A  and  B . 

19  7o 

20  7oNotes  : 

21  7oIf  P  =  1  .  then  C  =  A  OR  B  and  is  error-free. 

22  7oIf  p  =  0,  then  C  =  A  NOR  B. 

23  7oIf  p  =  0.5,  then  C  contains  completely  random  data. 

24  7o 

25  7oRef  er ence  : 

26  7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

27  7oits  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice  University, 

Department  of 

28  7oComputer  Science,  Jun  2008. 

29 

30  if  nargin  <  4 

31  C=bitor(A,  B); 

32  classname  =  class  (C); 

33  else 
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C  =  zeros  (  size  (A)  ,  classname); 

C  ( :  )  =  bitor (A  ,  B)  ; 

end 

if  nargin  >=  5 

err  =  biterrors ( size  (C)  ,  p,  classname,  bit); 

else 

err  =  biterrors ( size  (C)  ,  p,  classname); 

end 

C  =  bitxor(C,  err); 
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8.7  Inexact  Bitwise  XOR 


1  function  [  C  ]  =  bitxor_inexact  (  A,  B,  p,  classname  ,  bit  ) 

2  yobitxor_inexact  :  Calculates  the  bitwise  XOR  of  the  input 

arguments  A  and 

3  %B ,  similar  to  the  standard  bitxor  function,  except  that 

each  bit  has  a 

4  yrandom  error  probability  equal  to  1-p. 

5  % 

6  %  Inputs : 

7  7oA ,  B:  (nonnegative  integer  arrays)  Input  arguments  for 

the  XOR  operator . 

8  7o  B  must  have  the  same  dimensions  as  A. 

9  7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  C . 

10  7o  0  <=  p  <=  1. 

11  7oClassname  ;  (string)  The  class  name  of  the  output  array  C 

12  %bit  :  (integer  vector)  Which  bit  positions  can  be  inexact 

Position  1  is 

13  7o  lowest-order  bit.  (optional)  If  bit  is  omitted,  then 

all  positions  can 

14  7o  be  inexact  . 

15  I 

16  7o Outputs  : 

17  7oC :  (nonnegative  integer  array)  C  =  A  XOR  B,  subject  to 

a  bitwise  random 

18  7o  error  probability  1-p.  C  has  the  same  dimensions  as 

A  and  B . 

19  7o 

20  7oNotes  : 

21  7oIf  P  =  1  .  then  C  =  A  XOR  B  and  is  error-free. 

22  7oIf  p  =  0,  then  C  =  A  XNOR  B. 

23  7oIf  p  =  0.5,  then  C  contains  completely  random  data. 

24  7o 

25  7oRef  er ence  : 

26  7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 

Boolean  logic  and 

27  7oits  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice  University, 

Department  of 

28  7oComputer  Science,  Jun  2008. 

29 

30  if  nargin  <  4 

31  C  =  bitxor(A,  B) ; 

32  classname  =  class  (C); 

33  else 
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C  =  zeros  (  size  (A)  ,  classname); 

C(:)  =  bitxor(A,  B) ; 

end 

if  nargin  >=  5 

err  =  biterrors ( size  (C)  ,  p,  classname,  bit); 

else 

err  =  biterrors ( size  (C)  ,  p,  classname); 

end 

C  =  bitxor(C,  err); 
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Inexact  4-Input  Bitwise  AND 


function  [  E  ]  =  bitand4_inexact (  A,  B,  C,  D,  p,  classname 

,  bit  ) 

"/oCalculates  the  bitwise  (A  AND  B  AND  C  AND  D)  ,  similar  to 
the  bitand  func- 

"/otion  ,  except  that  each  bit  has  a  random  error  probability 
equal  to  1-p. 

7o 

%  Inputs : 

"/oA ,  B,  C,  D:  (nonnegative  integer  arrays)  Input  arguments 

for  the  AND4 

%  function.  A,  B,  C,  and  D  must  all  have  the  same 
dimensions  . 

7oP :  (scalar)  Probability  of  correctness  of  each  bit 

within  the  output  E. 

7,  0  <=  p  <=  1. 

%classname :  (string)  The  class  name  of  the  output  array  E 

7obit  :  (integer  vector)  Which  bit  positions  can  be  inexact 

Position  1  is 

7o  lowest-order  bit.  (optional)  If  bit  is  omitted,  then 
all  positions  can 
7o  be  inexact  . 

7o 

7o Outputs  : 

%E :  (nonnegative  integer  array)  E  =  (A  AND  B  AND  C  AND  D 

)  ,  sub j  ect  to  a 

7o  bitwise  random  error  probability  1-p.  E  has  the  same 
dimensions  as  A, 

%  B,  C,  and  D. 

% 

7oNotes  : 

7oIf  p  =  l  .  then  E  =  (A  AND  B  AND  C  AND  D)  and  is  error-free. 
%If  p=0 ,  then  E  equals  the  bitwise  complement  of  (A  AND  B 
AND  C  AND  D)  . 

7oIf  p  =  0.5,  then  E  contains  completely  random  data. 

7o 

7oRef  er ence  : 

7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 
Boolean  logic  and 

7oits  meaning,"  Tech.  Rep.  TR-08-05,  Rice  University, 
Department  of 

^Computer  Science,  Jun  2008. 

EO  =  bitand(B,  bitand(C,  D)); 
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if  nargin  <  6 

E  =  bitand(A,  EO); 
classname  =  class(D); 

else 

E  =  zeros  (  size  (A)  ,  classname); 

E(:)  =  bitand(A,  EO); 

end 

if  nargin  >=  7 

err  =  biterrors ( size (E)  ,  p,  classname, 

else 

err  =  biterrors ( size (E) ,  p,  classname) 

end 

E  =  bitxorCE,  err); 


bit )  ; 
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Appendix  I.  Advanced  Bitwise  Functions 


Unsigned  to  Signed  Class  Conversion 

function  [  b,  sgn ,  signedclass  ]  =  signed(  a,  classname  ) 

if  ~ exist (’ classname var ’ ) 

classname  =  regexprep ( class (a)  ,  ’ uint ’  int ’ ) ; 

end 

if  isa (a  U  uintS  ’  ) 

signedclass  =  false ; 
n  =  8 ; 

switch  classname 

case  {'int8’  ,  ’intl6'  ,  ’int32’  ,  'int64U 
b  =  zeros ( size (a) , classname ) ; 
otherwise 

error  ’  For u8-bit  u input  ,  uclassnameuniustubeuint8 
,uintl6 ,uint32 ,uoruint64. ’ 

end 

elseif  isa  (a  ,  ’ uint 16 ’ ) 

signedclass  =  false ; 
n  =  16; 

switch  classname 

case  {  ’  int  16  ’  ,  ’  int32  ’  ,  ’  int64  '  ]■ 

b  =  zeros ( size (a) , classname ) ; 
otherwise 

error  ’ For u 16 -bit u input , uclassnameumustubeu 
intl6 , uint 32 ,uoruint64. ’ 

end 

elseif  isa  (a ,  ’ nint32 ’ ) 

signedclass  =  false ; 
n  =  32 ; 

switch  classname 

case  { ’ int32 ’ , ’ int64 ’ } 

b  =  zeros(size(a), classname); 
otherwise 

error  ’  Foru32-bituinput  ,  uclassnameuniustuint32u 
or u int64  .  ’ 

end 

elseif  isa (a  ,  ’ nint64 ’ ) 

signedclass  =  false ; 
n  =  64 ; 

switch  classname 
case  ’ int64 ’ 

b  =  zeros(size(a), classname); 
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otherwise 

error  ’ Foru64-bituinput , uclassnameumustubeu 
int64 . ’ 

end 

else 

signedclass  =  true; 
b  =  a ; 

if  nargout  >=  2 

sgn  =  (a  <  0)  ; 

end 

end 

if  "signedclass 

OxFFFF  =  intmax ( class ( a) ) ; 
sgn  =  logical (bitget (a , n) ) ; 
acomp  =  bitxor ( a ( sgn ), OxFFFF ) ; 
m  =  (acomp  ==  intmax ( classname )) ; 
bl  =  -cast (acomp  +  1,  classname); 
bl (m)  =  intmin ( classname ) ; 
b(sgn)  =  bl; 
b("sgn)  =  abs (a (~ sgn) ) ; 

end 
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Signed  to  Unsigned  Class  Conversion 


function  [  b,  sgn ,  signedclass  ]  =  unsigned(  a,  classname 

) 

if  ~ exist (’ classname var ’ ) 

classname  =  regexprep ( class (a)  ,  ’ ^ int  ’  U aint O  ; 

end 

if  isa (a  U int8 ' ) 

signedclass  =  true; 
switch  classname 

case  { ’uintS ’ , ’uintl6 ’ , 'uint32 ’ , ’uint64 '} 
b  =  zeros ( size (a) , classname ) ; 
otherwise 

error  ’  For u8-bit u input  ,  uclassnameuHiustubeu 
uintS ,uuintl6 ,uuint32 ,uoruuint64. ' 

end 

elseif  isa  (a  ,  ’  iiitl6  C 

signedclass  =  true; 
switch  classname 

case  { ’uintl6 ’ , 'uint32 ’ , ’uint64 '} 
b  =  zeros ( size (a) , classname ) ; 
otherwise 

error  ’ For u 16 -bit u input , uclassnameumustubeu 
uintl6 ,uuint32 ,uoruuint64.  ' 

end 

elseif  isa  (a  ,  ’ int32 O 

signedclass  =  true; 
switch  classname 

case  { ’ uint32 ’ , ’ nint64 ’ } 

b  =  zeros(size(a), classname); 
otherwise 

error  ' Foru32-bituinput , uclassnameumustubeu 
uint32uoruuint64 . ’ 

end 

elseif  isa  (a  ,  ’ int64 O 

signedclass  =  true; 
switch  classname 
case  ’ uint64 ’ 

b  =  zeros(size(a), classname); 
otherwise 

error  ' Foru64-bituinput , uclassnameumustubeu 
uint64 .  ’ 

end 

else 
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false ; 


signedclass  = 
b  =  a ; 
if  nargout  >=  2 

sgn  =  (a  <  0)  ; 

end 

end 

if  signedclass 

OxFFFF  =  intmax ( classname ) ; 
sgn  =  (a  <  0)  ; 

m  =  (a  ==  intmin ( class (a) ) )  ; 

b(sgn)  =  bitxor ( cast (- a ( sgn ), classname ), OxFFFF )  +  1; 
b(m)  =  cast ( intmax ( class (a) ), classname )  +  1; 
b (~  sgn)  =  a (~  sgn)  ; 

end 
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Clear  Upper  Bits 


function  B  =  correct_upperbits (  A,  n  ) 

"/oTakes  an  integer  array  A,  stored  as  an  m-bit  signed  or 
^unsigned  integer  class  (where  m=8,16,32,  or  64),  and  an 
^integer  n,  and: 

%  (1)  For  each  A>=0 ,  or  if  A  is  unsigned,  clears  the 

%  uppermost  m-n  bits  of  A,  and 

7o  (2)  For  each  A<0,  sets  the  uppermost  m-n  bits  of  A. 

% 

"/oThe  lower  n  bits  remain  unchanged. 

7o 

7o Inputs  : 

%  A:  (integer  array)  Input  data.  Must  be  one  of  the 

7ointeger  classes  . 

7o  n:  (integer)  The  number  of  lower  bits  of  A  which 

will 

%remain  unchanged . 

% 

7o Output  : 

7o  B:  (integer  array)  Output  data,  with  upper  bits  set 

or 

7oCleared  as  described  above.  B  is  the  same  class  as  A. 

switch  class (A) 
case  ' intS  ' 


m 

=  8; 

signedA  = 

true  ; 

case 

' uintB ’ 

m 

=  8; 

signedA  = 

false ; 

case 

' intl6  ' 

m 

=  16; 

signedA 

=  true  ; 

case 

'uintl6 ' 

m 

=  16; 

signedA 

=  false 

case 

' int32  ' 

m 

=  32; 

signedA 

=  true  ; 

case 

' uint32 ' 

m 

=  32; 

signedA 

=  false 

case 

' int64 ’ 

m 

=  64; 

signedA 

=  true  ; 

case 

' uint64  ' 

m 

=  64; 

signedA 

=  false 

otherwise 

error  ' InputuAumustubeuof uthsuintegeruclasses . ' 

end 

if  signedA 
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signA  =  logical (bitget (A , m) ) ; 

OxFFFF  =  cast (-1 ,  ’ like  ’  , A)  ; 

OxFOOO  =  bit shif t ( OxFFFF , n) ; 

OxOFFF  =  bitcmp ( OxFOOO ) ; 

B  =  zeros ( size  (A)  ,  ’ like ’  ,  A)  ; 
if  isscalar (n) 

B(signA)  =  bit or ( A ( s ignA ) , OxFOOO ) ; 

B(~signA)  =  bitand (A (~ signA) , OxOFFF) ; 

else 

B(signA)  =  bit or (A ( s ignA ), OxFOOO ( s ignA )) ; 
B(~signA)  =  bitand (A (~ signA)  ,  OxOFFF  (~ signA) ) ; 

end 

else 

OxFFFF  =  intmax ( class  (A) )  ; 

OxFOOO  =  bit shif t ( OxFFFF , n) ; 

OxOFFF  =  bitcmp ( OxFOOO ) ; 

B  =  bitand (A , OxOFFF ) ; 

end 
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Test  if  an  A^-Bit  Number  is  Nonzero  (Inexact) 


function  H  =  any_high_bits_inexact_PBL (  X,  n,  p  ) 

7oComputes  the  bitwise  OR  of  all  bits  in  X .  If  any  bits  in 
X  are  nonzero  , 

7othen  H  is  true;  otherwise,  H  is  false.  The  algorithm 
uses  a  binary  tree 

7oOf  2-input  OR  gates  to  form  an  n-input  OR  gate. 

if  ~ exist (’ p var  ’  ) 

P  =  1; 

end 

k  =  [64,32,16,8,4,2,1]; 
k  =  k(k  <=  n); 

OxFFFF  =  bit cmpO ( zeros (' like ’,  X)  , n)  ; 

X  =  bitand (X , OxFFFF) ; 

for  k  =  k 

if  mod(n,2)  7o  if  n  is  odd,  then  OR  the  last 

two  bits  together 

X  =  bitset (X , 2 , or_inexact (bitget  (X  ,  2)  , bitget  (X  ,  1)  , 
p) )  ; 

X  =  bitshift (X , -1) ; 
n  =  n  -  1 ; 

OxFFFF  =  bitcmpO (zeros (' like X)  ,  n)  ; 

X  =  bitandCX, OxFFFF) ; 

end 

for  i  =  1  :  k 

if  (i+k)  <=  n 

X  =  bitset (X , i , or_inexact (bitget  (X  ,  i)  ,  bitget  (X 
, i  +  k)  ,p) )  ; 

end 

end 

n  =  k ; 

OxFFFF  =  bit cmpO ( zeros (’ like ’, X)  , n)  ; 

X  =  bitand (X , OxFFFF) ; 

end 

H  =  logical (bitget  (X  ,  1)  )  ; 
end 
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Inexact  Barrel  Shifter 


function  [  C,  stickybit ,  Cs  ]  =  bitshif ter_inexact_PBL (  A, 

B ,  na ,  nb ,  p  ) 

7o  bit  shifter  returns  A  shifted  B  bits  to  the  left  ( 
equivalent 

"/oto  multiplying  A  by  2*B)  ,  similar  to  the  Matlab  bitshift 
7ofunction  ,  except  this  function  simulates  a  barrel  shifter 
%in  a  digital  electronic  circuit .  If  B  is  positive ,  then 
A 

7ois  shifted  left  .  If  B  is  negative  ,  then  A  is  shifted 
right . 

7oAny  overflow  or  underflow  bits  are  truncated. 

% 

7oThis  barrel  shifter  is  subject  to  random  errors  at  each 
7onode  in  the  circuit  --  that  is  ,  every  AND  or  OR  gate  has 
a 

%random  error  probability  equal  to  1-p. 

7o 

7o Inputs  : 

7oA :  (integer  array)  Number(s)  to  be  shifted. 

7oB :  (integer  array)  Number  of  bit  positions  that  A  is  to 

be 

7o  shifted  . 

7ona ;  (integer)  Word  size  (number  of  bits)  of  A. 

%nb :  (integer)  Word  size  of  B. 

7oP :  (scalar)  Probability  of  correctness  of  each  AND  or  OR 

7o  inside  the  barrel  shifter.  0  <=  p  <=  1. 

I 

7o Output  : 

7oC :  (intger  array)  The  result  of  shifting  A  to  the  left 

by 

7o  B  bits  . 

7oStickybit:  (logical  array)  The  logical  OR  of  all  bits 

lost 

7o  due  to  truncation. 

7oCs  :  (integer  array)  For  B<0,  the  stickybit  is  OR  ’  d  with 

7o  the  least  significant  bit  of  C.  For  B>0,  the 
stickybit 

7o  is  OR  ’  d  with  the  most  significant  bit  of  C. 

if  ~  exist  (’ p var  O 
p  =  1; 

end 

sticky  =  (nargout  >=  2); 
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DxFFFF  =  cast (pow2a (na , ' uint64 O  ~  1,  ’like’.  A); 

S  =  sign (B)  ; 

B  =  abs (B )  ; 

L  =  (S  >  0)  ;  7.  left  shift 

R  =  ~L;  7  right  shift 

C  =  bitand (A , OxFFFF)  ; 

bb  =  zeros ( size  (B)  ,’ like ’, A)  ; 

bb_  =  bb ; 

ilsb  =  bb ; 


for  i  =  1  :  nb 

bb  (  :  )  =  bitget (B  (  : )  ,  i)  ; 
bb_(:)  =  1  -  bb(:)  ; 
bb(:)  =  bb(:)  *  DxFFFF; 
bb_(:)  =  bb_(:)  *  OxFFFF; 
k  =  pow2 ( i - 1 )  ; 

Cl  =  bitshif to (C , k*S , na) ; 

C1(R)  =  bitand_inexact(Cl(R),bb(R),p,class(A),l:(na-k) 

)  ; 

Cl (L)  =  bitand_inexact (Cl (L)  , bb (L)  ,p , class ( A)  ,  (k  +  1)  :  na 
)  ; 

CO  =  bitand_inexact(C,bb_,p,class(A),l:na); 


7777777777  Compute  sticky  bit  7777777777 

if  sticky 

iR  =  min ( [na , pow2 ( i - 1 ) ] )  ; 

iL  =  max ( [0 , na-pow2 ( i - 1 ) ] ) ; 
iL  =  na  -  iR; 

ilsb  (  :  )  =  pow2a ( iR ,  ’ uint64 ’ )  -  1; 
ilsb(L)  =  bitshif t ( ilsb (L)  ,  iL) ; 

C2  =  bitand  (C , ilsb)  ; 

C2(R)  =  bitand_inexact(C2(R),bb(R),p,class(A),l:iR 

)  ; 

C2(L)  =  bit and_ inexact ( C2 (L ), bb  (L)  , p , class (A ),( iL 
+  1)  : na)  ; 
if  i  ==  1 

stickybit  =  logical(C2); 

else 

C2(L)  =  bitshift (C2  (L)  , -iL)  ; 

C3  =  any_high_bits_inexact_PBL (C2 , iR , p)  ; 
stickybit  =  or_inexact  (  stickybit  , C3  ,  p)  ; 

end 

end 
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y  o;  0/  0/  0/  ./  0/  0/  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y  y 
/o  /o  /o  /o  /o  /o  /o  h  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o  /o 


C(R)  =  bitor_inexact(CO(R),Cl(R),p,class(A),l:(na-k)); 
C(L)  =  bitor_inexact  (CO  (L)  ,  Cl (L)  ,p , class (A)  ,  (k  +  1)  :  na)  ; 

end 

if  (nargout  >=  3)  &&  sticky  %  merge  sticky  bit 

with  final  output 
Cs  =  C; 

R  =  (S  <  0)  ;  %  right  shift 

sbO  =  cast ( stickybit  like C)  ; 

Cs(R)  =  bitor_inexact (Cs (R) ,  sbO(R),  p,  class(C),  1); 
Cs(L)  =  bitset (Cs (L)  ,  na  ,  or_inexact (bitget (Cs (L)  ,  na)  , 
sbO (L)  ,  p)  )  ; 

end 
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Inexact  Comparator 


function  [  AgtB ,  AltB ,  AeqB  ]  =  comparator_inexact_PBL (  n, 

A ,  B  ,  p  ) 

yoComparator_inexact_PBL  simulates  an  integer  comparator 
7oUsing  inexact  digital  logic  ,  and  returns  logical  values 
as 

"/ofollows  : 

7o  AgtB  =  true  if  A>B,  false  otherwise, 

7o  AltB  =  true  if  A<B,  false  otherwise,  and 

7o  AeqB  =  true  if  A  =  B ,  false  otherwise. 

7oEach  AND,  DR,  and  NOT  gate  in  the  comparator  is  subject 
to 

7oa  random  error  probability  1-p,  consistent  with  the  model 
7oOf  Probabilistic  Boolean  Logic. 

7o 


7o Inputs  : 

7on :  (integer)  Number  of  bits  processed  by  the  comparator. 

7oA ,  B:  (integer  arrays)  Integers  to  be  compared. 

7oP :  (scalar)  Probability  of  correctness  of  each  gate 
within 

7o  the  comparator.  0  <=  p  <=  1. 

% 


7o Outputs  : 

7oAgtB  :  (logical  array)  True  if  A>B , 

7oAltB  :  (logical  array)  True  if  A<B , 

7oAeqB  :  (logical  array)  True  if  A  =  B , 

% 


false  otherwise . 
false  otherwise . 
false  otherwise . 


7oRef  er ence  : 

7oL .  N.  B.  Chakrapani  and  K.  V.  Palem  ,  "A  probabilistic 
7oBoolean  logic  and  its  meaning,"  Tech.  Rep.  TR-08-05  ,  Rice 
7oUniversity  ,  Department  of  Computer  Science,  Jun  2008. 


if  ~  exist  (’ p var  O 
P  =  1; 

end 


DxFFFF  =  bit cmpO ( zeros (' like ’, A)  ,  n)  ; 

A  =  bitand (A , DxFFFF )  ; 

B  =  bitand (B , DxFFFF)  ; 

Abar  =  bitxor_inexact (A , DxFFFF ,p , class (A)  ,  1 : n)  ; 

Bbar  =  bitxor_inexact (B , DxFFFF ,p , class (B)  ,  1 : n) ; 

AbarB  =  bitand_inexact (Abar , B , p , class (A)  ,  1 : n)  ; 

ABbar  =  bitand_inexact (A , Bbar , p , class (A)  ,  1 : n)  ; 

AxnorB  =  bitnor_inexact ( AbarB , ABbar , n , p , class ( A)  ,  1 : n)  ; 
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C  =  f alse (numel (A) ,n) ; 

C(:,n)  =  bitget ( AxnorB  (  : )  , n)  ; 

AgtB  =  f alse ( size  (A)  )  ; 

AgtB  ( :  )  =  bitget ( ABbar (:), n)  ; 

for  i  =  (n-1)  :  -1  :  1 

C(:,i)  =  and_ inexact (C (:, i  + 1 ), bitget ( AxnorB  (:),  i ),  p )  ; 

end 

for  i  =  1  :  (n-1) 

AgtB  (  : )  =  or_inexact (AgtB (:), and (C  (:, i  +  1)  , bitget (ABbar 
(:)  ,i))  ,p)  ; 

end 

if  nargout  >=  2 

AltB  =  f alse ( size (A) )  ; 

AltB  (  : )  =  bitget ( AbarB (:), n)  ; 

for  i  =  1  :  (n-1) 

AltB  (  :  )  =  or_inexact (AltB (:), and (C (:, i  +  1)  ,  bitget  ( 
AbarB  (  : )  ,  i) )  ,p)  ; 

end 

if  nargout  >=  3 

AeqB  =  f al se ( s ize  ( A ) )  ; 

AeqB(:)  =  C(:  ,1)  ; 

end 

end 
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Appendix  J.  IEEE  754  Floating-Point  Functions 


10.1  Separate  Floating-Point  Number  into  its  Components 


1 

function  [  S,  E,  M,  ne ,  nm ,  ee ,  mm  ]  =  DecToI 

f  mt  ) 

EEE754(  X, 

2 

7,BINARY16  (half  precision) 

3 

%  Minimum  value  with  graceful  degradation: 

6e-8 

4 

%  Minimum  value  with  full  precision: 

6 . 104e-5 

5 

6 

7 

%  Maximum  value : 

7 

6 . 550  e+4 

/o 

yoBINARY32  (single  precision) 

8 

%  Minimum  value  with  graceful  degradation: 

1 .4e-45 

9 

%  Minimum  value  with  full  precision: 

1  .  1754944e-38 

10 

%  Maximum  value : 

3 . 4028235e+38 

11 

% 

12 

%BINARY64  (double  precision) 

13 

%  Minimum  value  with  graceful  degradation: 

4. 9e-324 

14 

%  Minimum  value  with  full  precision: 

2 . 22507385850720 le -308 

15 

7o  Maximum  value  : 

1 . 797693134862316 e+308 

16 

7 

17 

7BINARY128  (quadruple  precision) 

18 

7  Minimum  value  with  graceful  degradation: 

19 

7  Minimum  value  with  full  precision: 

20 

21 

7  Maximum  value : 

1 . 18973 149535723 18 e +4932 

22 

if  ~ exist (’ fmt var  ’  ) 

23 

fmt  =  ’ BINARY32  ’  ; 

24 

25 

end 

26 

switch  upper(fmt) 

27 

case  'BINARY16’ 

28 

ebias  =  15; 

29 

ne  =  5 ; 

30 

nm  =  10; 

31 

E  =  zeros ( size (X) uint8 ’) ; 

32 

M  =  zeros ( size  (X)  uint 16 O  ; 

33 

mnm  =  1024; 

34 

mna  =  1023; 

35 

case  'BINARY32’ 

36 

ebias  =  127; 

37 

ne  =  8 ; 

38 

nm  =  23; 
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E  =  zeros ( size (X) uintS ’) ; 

M  =  zeros ( size  (X)  uint32 O  ; 
mnm  =  8388608; 
mna  =  8388607 ; 
case  'BINARY64’ 


end 


ebias  =  1023; 
ne  =  11; 
nm  =  52; 

E  =  zeros ( size (X)  uint 16 O  ; 

M  =  zeros ( size (X)  uint64 O  ; 
mnm  =  4503599627370496; 
mna  =  4503599627370495; 
case  'BINARY128’ 
ebias  =  16383; 
ne  =  15; 
nm  =  112; 

E  =  zeros ( size (X)  uint 16 O  ; 

M  =  zeros ( size (X)  double O  ; 
mnm  =  5 . 1922968585348276 e33  ; 
mna  =  5 . 1922968585348276 e33  ; 
otherwise 

error  ’  f mt  uniust  ubeubinary  16  ,ubinary32 
orubinaryl28 . ’ 


ubinary64 , u 


S  =  (X  <  0)  ; 

X  =  abs (X)  ; 

ee  =  f loor ( log2 (X) )  ; 

i  =  (ee  <=  -ebias);  X  graceful  degradation 

toward  zero 
ee(i)  =  -ebias; 

E ( : )  =  ee  +  ebias ; 

mm  =  X  ./  pow2(ee); 

mml  =  mm; 

mml(~i)  =  mml(~i)  -  1; 

M(~i)  =  mml(~i)  *  mnm; 

MCi)  =  0.5  *  mml(i)  *  mnm; 

c  =  (M  >=  mnm)  &  (~i)  ; 

E(c)  =  E(c)  +  1; 

M(c)  =  M(c)  -  mnm ; 

j  =  (isinf(X)  I  (ee  >  ebias));  %  infinity 
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7,  NaN 


ee(j)  =  ebias  +  1; 

E(j)  =  ee(j)  +  ebias; 
mm  ( j  )  =  0  ; 

M(j)  =  0; 

k  =  isnan (X)  ; 
ee(k)  =  ebias  +  1; 

E(k)  =  ee(k)  +  ebias; 
mm(k)  =  mna ; 

M(k)  =  mna; 

z=(X==0);  7  zero 

ee  ( z )  =  0  ; 
mm  ( z )  =  0  ; 

E(z)  =  0; 

M(z)  =  0; 
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10.2 


Merge  Components  into  a  Floating-Point  Number 


1 

function  X  =  IEEE754toDec (  S,  E,  M,  fmt 

,  infnan 

2 

%BINARY16  (half  precision) 

3 

7o  Minimum  value  with  graceful  degradat 

ion  : 

6e 

4 

7o  Minimum  value  with  full  precision: 

6  . 

5 

7o  Maximum  value  : 

6  . 

6 

7o 

7 

%BINARY32  (single  precision) 

8 

7o  Minimum  value  with  graceful  degradat 

ion  : 

1  . 

9 

7o  Minimum  value  with  full  precision: 

1  . 

10 

7o  Maximum  value  : 

3  . 

11 

% 

12 

7oBINARY64  (double  precision) 

13 

7o  Minimum  value  with  graceful  degradat 

ion  : 

4. 

14 

7o  Minimum  value  with  full  precision: 

2 . 22507385850720 le -308 

15 

7o  Maximum  value  : 

1 . 797693134862316 e+308 

16 

7o 

17 

7oBINARY128  (quadruple  precision) 

18 

7o  Minimum  value  with  graceful  degradat 

ion  : 

19 

7o  Minimum  value  with  full  precision: 

20 

7o  Maximum  value  : 

1 . 18973 149535723 18 e +4932 

21 

22 

if  ~  exist  (’ fmt var  ’  ) 

23 

fmt  =  ’ BIMARY32  ’  ; 

24 

end 

25 

26 

if  ~ exist (’ inf nan var  ’  ) 

27 

inf nan  =  true ; 

28 

end 

29 

30 

switch  upper(fmt) 

31 

case  'BINARY16’ 

32 

ebias  =  15; 

33 

einf  =  31 ; 

34 

mnm  =  1024; 

35 

case  'BINARY32’ 

36 

ebias  =  127; 

37 

einf  =  255; 

38 

mnm  =  8388608; 

39 

case  'BINARY64’ 

40 

ebias  =  1023; 

41 

einf  =  2047 ; 

) 

-8 

104e-5 

550e+4 


4e-45 

1754944e-38 

4028235e+38 


9e-324 
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mnm  =  4503599627370496; 
case  'BINARY128’ 
ebias  =  16383; 
einf  =  32767 ; 

mnm  =  5  .  1922968585348276 e33  ; 
otherwise 


end 


error  ’  fmtuniustubeubinaryl6  ,ubinary32 
orubinaryl28 . ’ 


ubinary64 , u 


if  ( isinteger (E)  ||  isinteger (M) )  &&  all(E(:)  >=  0) 

ee  =  double (E)  -  ebias ; 

i  =  (E  <=  0)  ;  7o  graceful  degradation 

toward  zero 


mml  =  zeros(size(M)); 
mml(~i)  =  double (M ( ~ i ) )  /  mnm; 
mml(i)  =  2  *  double(M(i))  /  mnm; 
mm  =  mml ; 

mm  ( ~  i )  =  mm  ( ~  i )  +  1 ; 

else 

ee  =  double (E)  ; 
mm  =  double (M)  ; 

end 


X  =  pow2(ee)  .*  mm; 


j  =  ((E  >= 

einf ) 

& 

(M 

==  0) )  ; 

°/o  infinity 

k  =  ((E  >= 

if  infnan 

einf  ) 

& 

(M 

>  0))  ; 

%  NaN 

X(j)  = 

Inf  ; 

X(k)  = 

NaN  ; 

end 

z  =  ((E  == 
X(z)  =  0; 

0)  & 

(M 

=  = 

0)  )  ; 

7o  zero 

S  =  (S  ~=  0)  ; 

X(S)  =  -X(S) ; 

switch  upper(fmt) 

case  { ’ BINARY16  '  ,  ’ BINARY32  '  } 
X  =  cast (X single ’)  ; 

end 
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